Add SH4 support:

* config/sh/lib1funcs.asm (___movstr_i4_even, ___movstr_i4_odd): Define. (___movstrSI12_i4, ___sdivsi3_i4, ___udivsi3_i4): Define. * sh.c (reg_class_from_letter, regno_reg_class): Add DF_REGS. (fp_reg_names, assembler_dialect): New variables. (print_operand_address): Handle SUBREGs. (print_operand): Added 'o' case. Don't use adj_offsettable_operand on PRE_DEC / POST_INC. Name of FP registers depends on mode. (expand_block_move): Emit different code for SH4 hardware. (prepare_scc_operands): Use emit_sf_insn / emit_df_insn as appropriate. (from_compare): Likewise. (add_constant): New argument last_value. Changed all callers. (find_barrier): Don't try HImode load for FPUL_REG. (machine_dependent_reorg): Likewise. (sfunc_uses_reg): A CLOBBER cannot be the address register use. (gen_far_branch): Emit a barrier after the new jump. (barrier_align): Don't trust instruction lengths before fixing up pcloads. (machine_dependent_reorg): Add support for FIRST_XD_REG .. LAST_XD_REG. Use auto-inc addressing for fp registers if doubles need to be loaded in two steps. Set sh_flag_remove_dead_before_cse. (push): Support for TARGET_FMOVD. Use gen_push_fpul for fpul. (pop): Support for TARGET_FMOVD. Use gen_pop_fpul for fpul. (calc_live_regs): Support for TARGET_FMOVD. Don't save FPSCR. Support for FIRST_XD_REG .. LAST_XD_REG. (sh_expand_prologue): Support for FIRST_XD_REG .. LAST_XD_REG. (sh_expand_epilogue): Likewise. (sh_builtin_saveregs): Use DFmode moves for fp regs on SH4. (initial_elimination_offset): Take TARGET_ALIGN_DOUBLE into account. (arith_reg_operand): FPUL_REG is OK for SH4. (fp_arith_reg_operand, fp_extended_operand) New functions. (tertiary_reload_operand, fpscr_operand): Likewise. (commutative_float_operator, noncommutative_float_operator): Likewise. (binary_float_operator, get_fpscr_rtx, emit_sf_insn): Likewise. (emit_df_insn, expand_sf_unop, expand_sf_binop): Likewise. (expand_df_unop, expand_df_binop, expand_fp_branch): Likewise. (emit_fpscr_use, mark_use, remove_dead_before_cse): Likewise. * sh.h (CPP_SPEC): Add support for -m4, m4-single, m4-single-only. (CONDITIONAL_REGISTER_USAGE): Likewise. (HARD_SH4_BIT, FPU_SINGLE_BIT, SH4_BIT, FMOVD_BIT): Define. (TARGET_CACHE32, TARGET_SUPERSCALAR, TARGET_HARWARD): Define. (TARGET_HARD_SH4, TARGET_FPU_SINGLE, TARGET_SH4, TARGET_FMOVD): Define. (target_flag): Add -m4, m4-single, m4-single-only, -mfmovd. (OPTIMIZATION_OPTIONS): If optimizing, set flag_omit_frame_pointer to -1 and sh_flag_remove_dead_before_cse to 1. (ASSEMBLER_DIALECT): Define to assembler_dialect. (assembler_dialect, fp_reg_names): Declare. (OVERRIDE_OPTIONS): Add code for TARGET_SH4. Hide names of registers that are not accessible. (CACHE_LOG): Take TARGET_CACHE32 into account. (LOOP_ALIGN): Take TARGET_HARWARD into account. (FIRST_XD_REG, LAST_XD_REG, FPSCR_REG): Define. (FIRST_PSEUDO_REGISTER: Now 49. (FIXED_REGISTERS, CALL_USED_REGISTERS): Include values for registers. (HARD_REGNO_NREGS): Special treatment of FIRST_XD_REG .. LAST_XD_REG. (HARD_REGNO_MODE_OK): Update. (enum reg_class): Add DF_REGS and FPSCR_REGS. (REG_CLASS_NAMES, REG_CLASS_CONTENTS, REG_ALLOC_ORDER): Likewise. (SECONDARY_OUTPUT_RELOAD_CLASS, SECONDARY_INPUT_RELOAD_CLASS): Update. (CLASS_CANNOT_CHANGE_SIZE, DEBUG_REGISTER_NAMES): Define. (NPARM_REGS): Eight floating point parameter registers on SH4. (BASE_RETURN_VALUE_REG): SH4 also passes double values in floating point registers. (GET_SH_ARG_CLASS) Likewise. Complex float types are also returned in float registers. (BASE_ARG_REG): Complex float types are also passes in float registers. (FUNCTION_VALUE): Change mode like PROMOTE_MODE does. (LIBCALL_VALUE): Remove trailing semicolon. (ROUND_REG): Round when double precision value is passed in floating point register(s). (FUNCTION_ARG_ADVANCE): No change wanted for SH4 when things are passed on the stack. (FUNCTION_ARG): Little endian adjustment for SH4 SFmode. (FUNCTION_ARG_PARTIAL_NREGS): Zero for SH4. (TRAMPOLINE_ALIGNMENT): Take TARGET_HARWARD into account. (INITIALIZE_TRAMPOLINE): Emit ic_invalidate_line for TARGET_HARWARD. (MODE_DISP_OK_8): Not for SH4 DFmode. (GO_IF_LEGITIMATE_ADDRESS): No base reg + index reg for SH4 DFmode. Allow indexed addressing for PSImode after reload. (LEGITIMIZE_ADDRESS): Not for SH4 DFmode. (LEGITIMIZE_RELOAD_ADDRESS): Handle SH3E SFmode. Don't change SH4 DFmode nor PSImode RELOAD_FOR_INPUT_ADDRESS. (DOUBLE_TYPE_SIZE): 64 for SH4. (RTX_COSTS): Add PLUS case. Increae cost of ASHIFT, ASHIFTRT, LSHIFTRT case. (REGISTER_MOVE_COST): Add handling of R0_REGS, FPUL_REGS, T_REGS, MAC_REGS, PR_REGS, DF_REGS. (REGISTER_NAMES): Use fp_reg_names. (enum processor_type): Add PROCESSOR_SH4. (sh_flag_remove_dead_before_cse): Declare. (rtx_equal_function_value_matters, fpscr_rtx, get_fpscr_rtx): Declare. (PREDICATE_CODES): Add binary_float_operator, commutative_float_operator, fp_arith_reg_operand, fp_extended_operand, fpscr_operand, noncommutative_float_operator. (ADJUST_COST): Use different scale for TARGET_SUPERSCALAR. (SH_DYNAMIC_SHIFT_COST): Cheaper for SH4. * sh.md (attribute cpu): Add value sh4. (attrbutes fmovd, issues): Define. (attribute type): Add values dfp_arith, dfp_cmp, dfp_conv, dfdiv. (function units memory, int, mpy, fp): Make dependent on issue rate. (function units issue, single_issue, load_si, load): Define. (function units load_store, fdiv, gp_fpul): Define. (attribute hit_stack): Provide proper default. (use_sfunc_addr+1, udivsi3): Predicated on ! TARGET_SH4. (udivsi3_i4, udivsi3_i4_single, divsi3_i4, divsi3_i4_single): New insns. (udivsi3, divsi3): Emit special patterns for SH4 hardware, (mulsi3_call): Now uses match_operand for function address. (mulsi3): Also emit code for SH1 case. Wrap result in REG_LIBCALL / REG_RETVAL notes. (push, pop, push_e, pop_e): Now define_expands. (push_fpul, push_4, pop_fpul, pop_4, ic_invalidate_line): New expanders. (movsi_ie): Added y/i alternative. (ic_invalidate_line_i, movdf_i4): New insns. (movdf_i4+[123], reload_outdf+[12345], movsi_y+[12]): New splitters. (reload_indf, reload_outdf, reload_outsf, reload_insi): New expanders. (movdf): Add special code for SH4. (movsf_ie, movsf_ie+1, reload_insf, calli): Make use of fpscr visible. (call_valuei, calli, call_value): Likewise. (movsf): Emit no-op move. (mov_nop, movsi_y): New insns. (blt, sge): generalize to handle DFmode. (return predicate): Call emit_fpscr_use and remove_dead_before_cse. (block_move_real, block_lump_real): Predicate on ! TARGET_HARD_SH4. (block_move_real_i4, block_lump_real_i4, fpu_switch): New insns. (fpu_switch0, fpu_switch1, movpsi): New expanders. (fpu_switch+[12], fix_truncsfsi2_i4_2+1): New splitters. (toggle_sz): New insn. (addsf3, subsf3, mulsf3, divsf3): Now define_expands. (addsf3_i, subsf3_i, mulsf3_i4, mulsf3_ie, divsf3_i): New insns. (macsf3): Make use of fpscr visible. Disable for SH4. (floatsisf2): Make use of fpscr visible. (floatsisf2_i4): New insn. (floatsisf2_ie, fixsfsi, cmpgtsf_t, cmpeqsf_t): Disable for SH4. (ieee_ccmpeqsf_t): Likewise. (fix_truncsfsi2): Emit different code for SH4. (fix_truncsfsi2_i4, fix_truncsfsi2_i4_2, cmpgtsf_t_i4): New insns. (cmpeqsf_t_i4, ieee_ccmpeqsf_t_4): New insns. (negsf2, sqrtsf2, abssf2): Now expanders. (adddf3, subdf3i, muldf2, divdf3, floatsidf2): New expanders. (negsf2_i, sqrtsf2_i, abssf2_i, adddf3_i, subdf3_i): New insns. (muldf3_i, divdf3_i, floatsidf2_i, fix_truncdfsi2_i): New insns. (fix_truncdfsi2, cmpdf, negdf2, sqrtdf2, absdf2): New expanders. (fix_truncdfsi2_i4, cmpgtdf_t, cmpeqdf_t, ieee_ccmpeqdf_t): New insns. (fix_truncdfsi2_i4_2+1): New splitters. (negdf2_i, sqrtdf2_i, absdf2_i, extendsfdf2_i4): New insns. (extendsfdf2, truncdfsf2): New expanders. (truncdfsf2_i4): New insn. * t-sh (LIB1ASMFUNCS): Add _movstr_i4, _sdivsi3_i4, _udivsi3_i4. (MULTILIB_OPTIONS): Add m4-single-only/m4-single/m4. * float-sh.h: When testing for __SH3E__, also test for __SH4_SINGLE_ONLY__ . * va-sh.h (__va_freg): Define to float. (__va_greg, __fa_freg, __gnuc_va_list, va_start): Define for __SH4_SINGLE_ONLY__ like for __SH3E__ . (__PASS_AS_FLOAT, __TARGET_SH4_P): Likewise. (__PASS_AS_FLOAT): Use different definition for __SH4__ and __SH4_SINGLE__. (TARGET_SH4_P): Define. (va_arg): Use it. * sh.md (movdf_k, movsf_i): Tweak the condition so that init_expr_once is satisfied about the existence of load / store insns. * sh.md (movsi_i, movsi_ie, movsi_i_lowpart, movsf_i, movsf_ie): change m constraint in source operand to mr / mf . * va-sh.h (__va_arg_sh1): Use __asm instead of asm. * (__VA_REEF): Define. (__va_arg_sh1): Use it. * va-sh.h (va_start, va_arg, va_copy): Add parenteses. From-SVN: r23777

Add SH4 support:
* config/sh/lib1funcs.asm (___movstr_i4_even, ___movstr_i4_odd): Define. (___movstrSI12_i4, ___sdivsi3_i4, ___udivsi3_i4): Define. * sh.c (reg_class_from_letter, regno_reg_class): Add DF_REGS. (fp_reg_names, assembler_dialect): New variables. (print_operand_address): Handle SUBREGs. (print_operand): Added 'o' case. Don't use adj_offsettable_operand on PRE_DEC / POST_INC. Name of FP registers depends on mode. (expand_block_move): Emit different code for SH4 hardware. (prepare_scc_operands): Use emit_sf_insn / emit_df_insn as appropriate. (from_compare): Likewise. (add_constant): New argument last_value. Changed all callers. (find_barrier): Don't try HImode load for FPUL_REG. (machine_dependent_reorg): Likewise. (sfunc_uses_reg): A CLOBBER cannot be the address register use. (gen_far_branch): Emit a barrier after the new jump. (barrier_align): Don't trust instruction lengths before fixing up pcloads. (machine_dependent_reorg): Add support for FIRST_XD_REG .. LAST_XD_REG. Use auto-inc addressing for fp registers if doubles need to be loaded in two steps. Set sh_flag_remove_dead_before_cse. (push): Support for TARGET_FMOVD. Use gen_push_fpul for fpul. (pop): Support for TARGET_FMOVD. Use gen_pop_fpul for fpul. (calc_live_regs): Support for TARGET_FMOVD. Don't save FPSCR. Support for FIRST_XD_REG .. LAST_XD_REG. (sh_expand_prologue): Support for FIRST_XD_REG .. LAST_XD_REG. (sh_expand_epilogue): Likewise. (sh_builtin_saveregs): Use DFmode moves for fp regs on SH4. (initial_elimination_offset): Take TARGET_ALIGN_DOUBLE into account. (arith_reg_operand): FPUL_REG is OK for SH4. (fp_arith_reg_operand, fp_extended_operand) New functions. (tertiary_reload_operand, fpscr_operand): Likewise. (commutative_float_operator, noncommutative_float_operator): Likewise. (binary_float_operator, get_fpscr_rtx, emit_sf_insn): Likewise. (emit_df_insn, expand_sf_unop, expand_sf_binop): Likewise. (expand_df_unop, expand_df_binop, expand_fp_branch): Likewise. (emit_fpscr_use, mark_use, remove_dead_before_cse): Likewise. * sh.h (CPP_SPEC): Add support for -m4, m4-single, m4-single-only. (CONDITIONAL_REGISTER_USAGE): Likewise. (HARD_SH4_BIT, FPU_SINGLE_BIT, SH4_BIT, FMOVD_BIT): Define. (TARGET_CACHE32, TARGET_SUPERSCALAR, TARGET_HARWARD): Define. (TARGET_HARD_SH4, TARGET_FPU_SINGLE, TARGET_SH4, TARGET_FMOVD): Define. (target_flag): Add -m4, m4-single, m4-single-only, -mfmovd. (OPTIMIZATION_OPTIONS): If optimizing, set flag_omit_frame_pointer to -1 and sh_flag_remove_dead_before_cse to 1. (ASSEMBLER_DIALECT): Define to assembler_dialect. (assembler_dialect, fp_reg_names): Declare. (OVERRIDE_OPTIONS): Add code for TARGET_SH4. Hide names of registers that are not accessible. (CACHE_LOG): Take TARGET_CACHE32 into account. (LOOP_ALIGN): Take TARGET_HARWARD into account. (FIRST_XD_REG, LAST_XD_REG, FPSCR_REG): Define. (FIRST_PSEUDO_REGISTER: Now 49. (FIXED_REGISTERS, CALL_USED_REGISTERS): Include values for registers. (HARD_REGNO_NREGS): Special treatment of FIRST_XD_REG .. LAST_XD_REG. (HARD_REGNO_MODE_OK): Update. (enum reg_class): Add DF_REGS and FPSCR_REGS. (REG_CLASS_NAMES, REG_CLASS_CONTENTS, REG_ALLOC_ORDER): Likewise. (SECONDARY_OUTPUT_RELOAD_CLASS, SECONDARY_INPUT_RELOAD_CLASS): Update. (CLASS_CANNOT_CHANGE_SIZE, DEBUG_REGISTER_NAMES): Define. (NPARM_REGS): Eight floating point parameter registers on SH4. (BASE_RETURN_VALUE_REG): SH4 also passes double values in floating point registers. (GET_SH_ARG_CLASS) Likewise. Complex float types are also returned in float registers. (BASE_ARG_REG): Complex float types are also passes in float registers. (FUNCTION_VALUE): Change mode like PROMOTE_MODE does. (LIBCALL_VALUE): Remove trailing semicolon. (ROUND_REG): Round when double precision value is passed in floating point register(s). (FUNCTION_ARG_ADVANCE): No change wanted for SH4 when things are passed on the stack. (FUNCTION_ARG): Little endian adjustment for SH4 SFmode. (FUNCTION_ARG_PARTIAL_NREGS): Zero for SH4. (TRAMPOLINE_ALIGNMENT): Take TARGET_HARWARD into account. (INITIALIZE_TRAMPOLINE): Emit ic_invalidate_line for TARGET_HARWARD. (MODE_DISP_OK_8): Not for SH4 DFmode. (GO_IF_LEGITIMATE_ADDRESS): No base reg + index reg for SH4 DFmode. Allow indexed addressing for PSImode after reload. (LEGITIMIZE_ADDRESS): Not for SH4 DFmode. (LEGITIMIZE_RELOAD_ADDRESS): Handle SH3E SFmode. Don't change SH4 DFmode nor PSImode RELOAD_FOR_INPUT_ADDRESS. (DOUBLE_TYPE_SIZE): 64 for SH4. (RTX_COSTS): Add PLUS case. Increae cost of ASHIFT, ASHIFTRT, LSHIFTRT case. (REGISTER_MOVE_COST): Add handling of R0_REGS, FPUL_REGS, T_REGS, MAC_REGS, PR_REGS, DF_REGS. (REGISTER_NAMES): Use fp_reg_names. (enum processor_type): Add PROCESSOR_SH4. (sh_flag_remove_dead_before_cse): Declare. (rtx_equal_function_value_matters, fpscr_rtx, get_fpscr_rtx): Declare. (PREDICATE_CODES): Add binary_float_operator, commutative_float_operator, fp_arith_reg_operand, fp_extended_operand, fpscr_operand, noncommutative_float_operator. (ADJUST_COST): Use different scale for TARGET_SUPERSCALAR. (SH_DYNAMIC_SHIFT_COST): Cheaper for SH4. * sh.md (attribute cpu): Add value sh4. (attrbutes fmovd, issues): Define. (attribute type): Add values dfp_arith, dfp_cmp, dfp_conv, dfdiv. (function units memory, int, mpy, fp): Make dependent on issue rate. (function units issue, single_issue, load_si, load): Define. (function units load_store, fdiv, gp_fpul): Define. (attribute hit_stack): Provide proper default. (use_sfunc_addr+1, udivsi3): Predicated on ! TARGET_SH4. (udivsi3_i4, udivsi3_i4_single, divsi3_i4, divsi3_i4_single): New insns. (udivsi3, divsi3): Emit special patterns for SH4 hardware, (mulsi3_call): Now uses match_operand for function address. (mulsi3): Also emit code for SH1 case. Wrap result in REG_LIBCALL / REG_RETVAL notes. (push, pop, push_e, pop_e): Now define_expands. (push_fpul, push_4, pop_fpul, pop_4, ic_invalidate_line): New expanders. (movsi_ie): Added y/i alternative. (ic_invalidate_line_i, movdf_i4): New insns. (movdf_i4+[123], reload_outdf+[12345], movsi_y+[12]): New splitters. (reload_indf, reload_outdf, reload_outsf, reload_insi): New expanders. (movdf): Add special code for SH4. (movsf_ie, movsf_ie+1, reload_insf, calli): Make use of fpscr visible. (call_valuei, calli, call_value): Likewise. (movsf): Emit no-op move. (mov_nop, movsi_y): New insns. (blt, sge): generalize to handle DFmode. (return predicate): Call emit_fpscr_use and remove_dead_before_cse. (block_move_real, block_lump_real): Predicate on ! TARGET_HARD_SH4. (block_move_real_i4, block_lump_real_i4, fpu_switch): New insns. (fpu_switch0, fpu_switch1, movpsi): New expanders. (fpu_switch+[12], fix_truncsfsi2_i4_2+1): New splitters. (toggle_sz): New insn. (addsf3, subsf3, mulsf3, divsf3): Now define_expands. (addsf3_i, subsf3_i, mulsf3_i4, mulsf3_ie, divsf3_i): New insns. (macsf3): Make use of fpscr visible. Disable for SH4. (floatsisf2): Make use of fpscr visible. (floatsisf2_i4): New insn. (floatsisf2_ie, fixsfsi, cmpgtsf_t, cmpeqsf_t): Disable for SH4. (ieee_ccmpeqsf_t): Likewise. (fix_truncsfsi2): Emit different code for SH4. (fix_truncsfsi2_i4, fix_truncsfsi2_i4_2, cmpgtsf_t_i4): New insns. (cmpeqsf_t_i4, ieee_ccmpeqsf_t_4): New insns. (negsf2, sqrtsf2, abssf2): Now expanders. (adddf3, subdf3i, muldf2, divdf3, floatsidf2): New expanders. (negsf2_i, sqrtsf2_i, abssf2_i, adddf3_i, subdf3_i): New insns. (muldf3_i, divdf3_i, floatsidf2_i, fix_truncdfsi2_i): New insns. (fix_truncdfsi2, cmpdf, negdf2, sqrtdf2, absdf2): New expanders. (fix_truncdfsi2_i4, cmpgtdf_t, cmpeqdf_t, ieee_ccmpeqdf_t): New insns. (fix_truncdfsi2_i4_2+1): New splitters. (negdf2_i, sqrtdf2_i, absdf2_i, extendsfdf2_i4): New insns. (extendsfdf2, truncdfsf2): New expanders. (truncdfsf2_i4): New insn. * t-sh (LIB1ASMFUNCS): Add _movstr_i4, _sdivsi3_i4, _udivsi3_i4. (MULTILIB_OPTIONS): Add m4-single-only/m4-single/m4. * float-sh.h: When testing for __SH3E__, also test for __SH4_SINGLE_ONLY__ . * va-sh.h (__va_freg): Define to float. (__va_greg, __fa_freg, __gnuc_va_list, va_start): Define for __SH4_SINGLE_ONLY__ like for __SH3E__ . (__PASS_AS_FLOAT, __TARGET_SH4_P): Likewise. (__PASS_AS_FLOAT): Use different definition for __SH4__ and __SH4_SINGLE__. (TARGET_SH4_P): Define. (va_arg): Use it. * sh.md (movdf_k, movsf_i): Tweak the condition so that init_expr_once is satisfied about the existence of load / store insns. * sh.md (movsi_i, movsi_ie, movsi_i_lowpart, movsf_i, movsf_ie): change m constraint in source operand to mr / mf . * va-sh.h (__va_arg_sh1): Use __asm instead of asm. * (__VA_REEF): Define. (__va_arg_sh1): Use it. * va-sh.h (va_start, va_arg, va_copy): Add parenteses. From-SVN: r23777
225e4f43 · J"orn Rennecke · Joern Rennecke · 57cfc5dd · 225e4f43 · 225e4f43
Commit 225e4f43 authored Nov 23, 1998 by J"orn Rennecke Committed by Joern Rennecke Nov 23, 1998
Showing with 460 additions and 43 deletions

gcc/ChangeLog
+178 -0

gcc/config/float-sh.h
+1 -1

gcc/config/sh/lib1funcs.asm
+216 -4

gcc/config/sh/sh.c
+0 -0

gcc/config/sh/sh.h
+0 -0

gcc/config/sh/sh.md
+0 -0

gcc/config/sh/t-sh
+2 -2

gcc/ginclude/va-sh.h
+63 -36

No files found.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
+Mon Nov 23 16:46:46 1998  J"orn Rennecke <amylaar@cygnus.co.uk>
+	Add SH4 support:
+	* config/sh/lib1funcs.asm (___movstr_i4_even, ___movstr_i4_odd): Define.
+	(___movstrSI12_i4, ___sdivsi3_i4, ___udivsi3_i4): Define.
+	* sh.c (reg_class_from_letter, regno_reg_class): Add DF_REGS.
+	(fp_reg_names, assembler_dialect): New variables.
+	(print_operand_address): Handle SUBREGs.
+	(print_operand): Added 'o' case.
+	Don't use adj_offsettable_operand on PRE_DEC / POST_INC.
+	Name of FP registers depends on mode.
+	(expand_block_move): Emit different code for SH4 hardware.
+	(prepare_scc_operands): Use emit_sf_insn / emit_df_insn as appropriate.
+	(from_compare): Likewise.
+	(add_constant): New argument last_value.  Changed all callers.
+	(find_barrier): Don't try HImode load for FPUL_REG.
+	(machine_dependent_reorg): Likewise.
+	(sfunc_uses_reg): A CLOBBER cannot be the address register use.
+	(gen_far_branch): Emit a barrier after the new jump.
+	(barrier_align): Don't trust instruction lengths before
+	fixing up pcloads.
+	(machine_dependent_reorg): Add support for FIRST_XD_REG .. LAST_XD_REG.
+	Use auto-inc addressing for fp registers if doubles need to
+	be loaded in two steps.
+	Set sh_flag_remove_dead_before_cse.
+	(push): Support for TARGET_FMOVD.  Use gen_push_fpul for fpul.
+	(pop): Support for TARGET_FMOVD.  Use gen_pop_fpul for fpul.
+	(calc_live_regs): Support for TARGET_FMOVD.  Don't save FPSCR.
+	Support for FIRST_XD_REG .. LAST_XD_REG.
+	(sh_expand_prologue): Support for FIRST_XD_REG .. LAST_XD_REG.
+	(sh_expand_epilogue): Likewise.
+	(sh_builtin_saveregs): Use DFmode moves for fp regs on SH4.
+	(initial_elimination_offset): Take TARGET_ALIGN_DOUBLE into account.
+	(arith_reg_operand): FPUL_REG is OK for SH4.
+	(fp_arith_reg_operand, fp_extended_operand) New functions.
+	(tertiary_reload_operand, fpscr_operand): Likewise.
+	(commutative_float_operator, noncommutative_float_operator): Likewise.
+	(binary_float_operator, get_fpscr_rtx, emit_sf_insn): Likewise.
+	(emit_df_insn, expand_sf_unop, expand_sf_binop): Likewise.
+	(expand_df_unop, expand_df_binop, expand_fp_branch): Likewise.
+	(emit_fpscr_use, mark_use, remove_dead_before_cse): Likewise.
+	* sh.h (CPP_SPEC): Add support for -m4, m4-single, m4-single-only.
+	(CONDITIONAL_REGISTER_USAGE): Likewise.
+	(HARD_SH4_BIT, FPU_SINGLE_BIT, SH4_BIT, FMOVD_BIT): Define.
+	(TARGET_CACHE32, TARGET_SUPERSCALAR, TARGET_HARWARD): Define.
+	(TARGET_HARD_SH4, TARGET_FPU_SINGLE, TARGET_SH4, TARGET_FMOVD): Define.
+	(target_flag): Add -m4, m4-single, m4-single-only, -mfmovd.
+	(OPTIMIZATION_OPTIONS): If optimizing, set flag_omit_frame_pointer
+	to -1 and sh_flag_remove_dead_before_cse to 1.
+	(ASSEMBLER_DIALECT): Define to assembler_dialect.
+	(assembler_dialect, fp_reg_names): Declare.
+	(OVERRIDE_OPTIONS): Add code for TARGET_SH4.
+	Hide names of registers that are not accessible.
+	(CACHE_LOG): Take TARGET_CACHE32 into account.
+	(LOOP_ALIGN): Take TARGET_HARWARD into account.
+	(FIRST_XD_REG, LAST_XD_REG, FPSCR_REG): Define.
+	(FIRST_PSEUDO_REGISTER: Now 49.
+	(FIXED_REGISTERS, CALL_USED_REGISTERS): Include values for registers.
+	(HARD_REGNO_NREGS): Special treatment of FIRST_XD_REG .. LAST_XD_REG.
+	(HARD_REGNO_MODE_OK): Update.
+	(enum reg_class): Add DF_REGS and FPSCR_REGS.
+	(REG_CLASS_NAMES, REG_CLASS_CONTENTS, REG_ALLOC_ORDER): Likewise.
+	(SECONDARY_OUTPUT_RELOAD_CLASS, SECONDARY_INPUT_RELOAD_CLASS): Update.
+	(CLASS_CANNOT_CHANGE_SIZE, DEBUG_REGISTER_NAMES): Define.
+	(NPARM_REGS): Eight floating point parameter registers on SH4.
+	(BASE_RETURN_VALUE_REG): SH4 also passes double values
+	in floating point registers.
+	(GET_SH_ARG_CLASS) Likewise.
+	Complex float types are also returned in float registers.
+	(BASE_ARG_REG): Complex float types are also passes in float registers.
+	(FUNCTION_VALUE): Change mode like PROMOTE_MODE does.
+	(LIBCALL_VALUE): Remove trailing semicolon.
+	(ROUND_REG): Round when double precision value is passed in floating
+	point register(s).
+	(FUNCTION_ARG_ADVANCE): No change wanted for SH4 when things are
+	passed on the stack.
+	(FUNCTION_ARG): Little endian adjustment for SH4 SFmode.
+	(FUNCTION_ARG_PARTIAL_NREGS): Zero for SH4.
+	(TRAMPOLINE_ALIGNMENT): Take TARGET_HARWARD into account.
+	(INITIALIZE_TRAMPOLINE): Emit ic_invalidate_line for TARGET_HARWARD.
+	(MODE_DISP_OK_8): Not for SH4 DFmode.
+	(GO_IF_LEGITIMATE_ADDRESS): No base reg + index reg for SH4 DFmode.
+	Allow indexed addressing for PSImode after reload.
+	(LEGITIMIZE_ADDRESS): Not for SH4 DFmode.
+	(LEGITIMIZE_RELOAD_ADDRESS): Handle SH3E SFmode.
+	Don't change SH4 DFmode nor PSImode RELOAD_FOR_INPUT_ADDRESS.
+	(DOUBLE_TYPE_SIZE): 64 for SH4.
+	(RTX_COSTS): Add PLUS case.
+	Increae cost of ASHIFT, ASHIFTRT, LSHIFTRT case.
+	(REGISTER_MOVE_COST): Add handling of R0_REGS, FPUL_REGS, T_REGS,
+	MAC_REGS, PR_REGS, DF_REGS.
+	(REGISTER_NAMES): Use fp_reg_names.
+	(enum processor_type): Add PROCESSOR_SH4.
+	(sh_flag_remove_dead_before_cse): Declare.
+	(rtx_equal_function_value_matters, fpscr_rtx, get_fpscr_rtx): Declare.
+	(PREDICATE_CODES): Add binary_float_operator,
+	commutative_float_operator, fp_arith_reg_operand, fp_extended_operand,
+	fpscr_operand, noncommutative_float_operator.
+	(ADJUST_COST): Use different scale for TARGET_SUPERSCALAR.
+	(SH_DYNAMIC_SHIFT_COST): Cheaper for SH4.
+	* sh.md (attribute cpu): Add value sh4.
+	(attrbutes fmovd, issues): Define.
+	(attribute type): Add values dfp_arith, dfp_cmp, dfp_conv, dfdiv.
+	(function units memory, int, mpy, fp): Make dependent on issue rate.
+	(function units issue, single_issue, load_si, load): Define.
+	(function units load_store, fdiv, gp_fpul): Define.
+	(attribute hit_stack): Provide proper default.
+	(use_sfunc_addr+1, udivsi3): Predicated on ! TARGET_SH4.
+	(udivsi3_i4, udivsi3_i4_single, divsi3_i4, divsi3_i4_single): New insns.
+	(udivsi3, divsi3): Emit special patterns for SH4 hardware,
+	(mulsi3_call): Now uses match_operand for function address.
+	(mulsi3): Also emit code for SH1 case.  Wrap result in REG_LIBCALL /
+	REG_RETVAL notes.
+	(push, pop, push_e, pop_e): Now define_expands.
+	(push_fpul, push_4, pop_fpul, pop_4, ic_invalidate_line): New expanders.
+	(movsi_ie): Added y/i alternative.
+	(ic_invalidate_line_i, movdf_i4): New insns.
+	(movdf_i4+[123], reload_outdf+[12345], movsi_y+[12]): New splitters.
+	(reload_indf, reload_outdf, reload_outsf, reload_insi): New expanders.
+	(movdf): Add special code for SH4.
+	(movsf_ie, movsf_ie+1, reload_insf, calli): Make use of fpscr visible.
+	(call_valuei, calli, call_value): Likewise.
+	(movsf): Emit no-op move.
+	(mov_nop, movsi_y): New insns.
+	(blt, sge): generalize to handle DFmode.
+	(return predicate): Call emit_fpscr_use and remove_dead_before_cse.
+	(block_move_real, block_lump_real): Predicate on ! TARGET_HARD_SH4.
+	(block_move_real_i4, block_lump_real_i4, fpu_switch): New insns.
+	(fpu_switch0, fpu_switch1, movpsi): New expanders.
+	(fpu_switch+[12], fix_truncsfsi2_i4_2+1): New splitters.
+	(toggle_sz): New insn.
+	(addsf3, subsf3, mulsf3, divsf3): Now define_expands.
+	(addsf3_i, subsf3_i, mulsf3_i4, mulsf3_ie, divsf3_i): New insns.
+	(macsf3): Make use of fpscr visible.  Disable for SH4.
+	(floatsisf2): Make use of fpscr visible.
+	(floatsisf2_i4): New insn.
+	(floatsisf2_ie, fixsfsi, cmpgtsf_t, cmpeqsf_t): Disable for SH4.
+	(ieee_ccmpeqsf_t): Likewise.
+	(fix_truncsfsi2): Emit different code for SH4.
+	(fix_truncsfsi2_i4, fix_truncsfsi2_i4_2, cmpgtsf_t_i4): New insns.
+	(cmpeqsf_t_i4, ieee_ccmpeqsf_t_4): New insns.
+	(negsf2, sqrtsf2, abssf2): Now expanders.
+	(adddf3, subdf3i, muldf2, divdf3, floatsidf2): New expanders.
+	(negsf2_i, sqrtsf2_i, abssf2_i, adddf3_i, subdf3_i): New insns.
+	(muldf3_i, divdf3_i, floatsidf2_i, fix_truncdfsi2_i): New insns.
+	(fix_truncdfsi2, cmpdf, negdf2, sqrtdf2, absdf2): New expanders.
+	(fix_truncdfsi2_i4, cmpgtdf_t, cmpeqdf_t, ieee_ccmpeqdf_t): New insns.
+	(fix_truncdfsi2_i4_2+1): New splitters.
+	(negdf2_i, sqrtdf2_i, absdf2_i, extendsfdf2_i4): New insns.
+	(extendsfdf2, truncdfsf2): New expanders.
+	(truncdfsf2_i4): New insn.
+	* t-sh (LIB1ASMFUNCS): Add _movstr_i4, _sdivsi3_i4, _udivsi3_i4.
+	(MULTILIB_OPTIONS): Add m4-single-only/m4-single/m4.
+	* float-sh.h: When testing for __SH3E__, also test for
+	__SH4_SINGLE_ONLY__ .
+	* va-sh.h (__va_freg): Define to float.
+	(__va_greg, __fa_freg, __gnuc_va_list, va_start):
+        Define for __SH4_SINGLE_ONLY__ like for __SH3E__ .
+        (__PASS_AS_FLOAT, __TARGET_SH4_P): Likewise.
+	(__PASS_AS_FLOAT): Use different definition for __SH4__ and
+	 __SH4_SINGLE__.
+	(TARGET_SH4_P): Define.
+	(va_arg): Use it.
+	* sh.md (movdf_k, movsf_i): Tweak the condition so that
+	init_expr_once is satisfied about the existence of load / store insns.
+	* sh.md (movsi_i, movsi_ie, movsi_i_lowpart, movsf_i, movsf_ie):
+        change m constraint in source operand to mr / mf .
+	* va-sh.h (__va_arg_sh1): Use __asm instead of asm.
+	* (__VA_REEF): Define.
+	(__va_arg_sh1): Use it.
+	* va-sh.h (va_start, va_arg, va_copy): Add parenteses.
 Sun Nov 22 21:34:02 1998  Jeffrey A Law  (law@cygnus.com)
 	* i386/dgux.c (struct option): Add new "description field".

--- a/gcc/config/float-sh.h
+++ b/gcc/config/float-sh.h
@@ -37,7 +37,7 @@
 #undef FLT_MAX_10_EXP
 #define FLT_MAX_10_EXP 38
-#ifdef __SH3E__
+#if defined (__SH3E__) || defined (__SH4_SINGLE_ONLY__)
   /* Number of base-FLT_RADIX digits in the significand of a double */
 #undef DBL_MANT_DIG

--- a/gcc/config/sh/lib1funcs.asm
+++ b/gcc/config/sh/lib1funcs.asm
@@ -770,6 +770,64 @@ ___movstr:
 	add	#64,r4
 #endif
+#ifdef L_movstr_i4
+#if defined(__SH4__) || defined(__SH4_SINGLE__) || defined(__SH4_SINGLE_ONLY__)
+	.text
+	.global	___movstr_i4_even
+	.global	___movstr_i4_odd
+	.global	___movstrSI12_i4
+	.p2align	5
+L_movstr_2mod4_end:
+	mov.l	r0,@(16,r4)
+	rts
+	mov.l	r1,@(20,r4)
+	.p2align	2
+___movstr_i4_odd:
+	mov.l	@r5+,r1
+	add	#-4,r4
+	mov.l	@r5+,r2
+	mov.l	@r5+,r3
+	mov.l	r1,@(4,r4)
+	mov.l	r2,@(8,r4)
+L_movstr_loop:
+	mov.l	r3,@(12,r4)
+	dt	r6
+	mov.l	@r5+,r0
+	bt/s	L_movstr_2mod4_end
+	mov.l	@r5+,r1
+	add	#16,r4
+L_movstr_start_even:
+	mov.l	@r5+,r2
+	mov.l	@r5+,r3
+	mov.l	r0,@r4
+	dt	r6
+	mov.l	r1,@(4,r4)
+	bf/s	L_movstr_loop
+	mov.l	r2,@(8,r4)
+	rts
+	mov.l	r3,@(12,r4)
+___movstr_i4_even:
+	mov.l	@r5+,r0
+	bra	L_movstr_start_even
+	mov.l	@r5+,r1
+	.p2align	4
+___movstrSI12_i4:
+	mov.l	@r5,r0
+	mov.l	@(4,r5),r1
+	mov.l	@(8,r5),r2
+	mov.l	r0,@r4
+	mov.l	r1,@(4,r4)
+	rts
+	mov.l	r2,@(8,r4)
+#endif /* ! __SH4__ */
+#endif
 #ifdef L_mulsi3
@@ -808,9 +866,47 @@ hiset:	sts	macl,r0		! r0 = bb*dd
 #endif
-#ifdef L_sdivsi3
+#ifdef L_sdivsi3_i4
 	.title "SH DIVIDE"
 !! 4 byte integer Divide code for the Hitachi SH
+#ifdef __SH4__
+!! args in r4 and r5, result in fpul, clobber dr0, dr2
+	.global	___sdivsi3_i4
+___sdivsi3_i4:
+	lds r4,fpul
+	float fpul,dr0
+	lds r5,fpul
+	float fpul,dr2
+	fdiv dr2,dr0
+	rts
+	ftrc dr0,fpul
+#elif defined(__SH4_SINGLE__) || defined(__SH4_SINGLE_ONLY__)
+!! args in r4 and r5, result in fpul, clobber r2, dr0, dr2
+	.global	___sdivsi3_i4
+___sdivsi3_i4:
+	sts.l fpscr,@-r15
+	mov #8,r2
+	swap.w r2,r2
+	lds r2,fpscr
+	lds r4,fpul
+	float fpul,dr0
+	lds r5,fpul
+	float fpul,dr2
+	fdiv dr2,dr0
+	ftrc dr0,fpul
+	rts
+	lds.l @r15+,fpscr
+#endif /* ! __SH4__ */
+#endif
+#ifdef L_sdivsi3
+/* __SH4_SINGLE_ONLY__ keeps this part for link compatibility with
+   sh3e code.  */
+#if ! defined(__SH4__) && ! defined (__SH4_SINGLE__)
 !!
 !! Steve Chamberlain
 !! sac@cygnus.com
@@ -904,11 +1000,109 @@ ___sdivsi3:
 div0:	rts
 	mov	#0,r0
+#endif /* ! __SH4__ */
 #endif
-#ifdef L_udivsi3
+#ifdef L_udivsi3_i4
 	.title "SH DIVIDE"
 !! 4 byte integer Divide code for the Hitachi SH
+#ifdef __SH4__
+!! args in r4 and r5, result in fpul, clobber r0, r1, r4, r5, dr0, dr2, dr4
+	.global	___udivsi3_i4
+___udivsi3_i4:
+	mov #1,r1
+	cmp/hi r1,r5
+	bf trivial
+	rotr r1
+	xor r1,r4
+	lds r4,fpul
+	mova L1,r0
+#ifdef FMOVD_WORKS
+	fmov.d @r0+,dr4
+#else
+#ifdef __LITTLE_ENDIAN__
+	fmov.s @r0+,fr5
+	fmov.s @r0,fr4
+#else
+	fmov.s @r0+,fr4
+	fmov.s @r0,fr5
+#endif
+#endif
+	float fpul,dr0
+	xor r1,r5
+	lds r5,fpul
+	float fpul,dr2
+	fadd dr4,dr0
+	fadd dr4,dr2
+	fdiv dr2,dr0
+	rts
+	ftrc dr0,fpul
+trivial:
+	rts
+	lds r4,fpul
+	.align 2
+L1:
+	.double 2147483648
+#elif defined(__SH4_SINGLE__) || defined(__SH4_SINGLE_ONLY__)
+!! args in r4 and r5, result in fpul, clobber r0, r1, r4, r5, dr0, dr2, dr4
+	.global	___udivsi3_i4
+___udivsi3_i4:
+	mov #1,r1
+	cmp/hi r1,r5
+	bf trivial
+	sts.l fpscr,@-r15
+	mova L1,r0
+	lds.l @r0+,fpscr
+	rotr r1
+	xor r1,r4
+	lds r4,fpul
+#ifdef FMOVD_WORKS
+	fmov.d @r0+,dr4
+#else
+#ifdef __LITTLE_ENDIAN__
+	fmov.s @r0+,fr5
+	fmov.s @r0,fr4
+#else
+	fmov.s @r0+,fr4
+	fmov.s @r0,fr5
+#endif
+#endif
+	float fpul,dr0
+	xor r1,r5
+	lds r5,fpul
+	float fpul,dr2
+	fadd dr4,dr0
+	fadd dr4,dr2
+	fdiv dr2,dr0
+	ftrc dr0,fpul
+	rts
+	lds.l @r15+,fpscr
+trivial:
+	rts
+	lds r4,fpul
+	.align 2
+L1:
+#ifdef __LITTLE_ENDIAN__
+	.long 0x80000
+#else
+	.long 0x180000
+#endif
+	.double 2147483648
+#endif /* ! __SH4__ */
+#endif
+#ifdef L_udivsi3
+/* __SH4_SINGLE_ONLY__ keeps this part for link compatibility with
+   sh3e code.  */
+#if ! defined(__SH4__) && ! defined (__SH4_SINGLE__)
 !!
 !! Steve Chamberlain
 !! sac@cygnus.com
@@ -966,22 +1160,40 @@ vshortway:
 ret:	rts
 	mov	r4,r0
+#endif /* __SH4__ */
 #endif
 #ifdef L_set_fpscr
-#if defined (__SH3E__)
+#if defined (__SH3E__) || defined(__SH4_SINGLE__) || defined(__SH4__) || defined(__SH4_SINGLE_ONLY__)
 	.global ___set_fpscr
 ___set_fpscr:
 	lds r4,fpscr
 	mov.l ___set_fpscr_L1,r1
 	swap.w r4,r0
 	or #24,r0
+#ifndef FMOVD_WORKS
 	xor #16,r0
+#endif
+#if defined(__SH4__)
+	swap.w r0,r3
+	mov.l r3,@(4,r1)
+#else /* defined(__SH3E__) || defined(__SH4_SINGLE*__) */
 	swap.w r0,r2
 	mov.l r2,@r1
+#endif
+#ifndef FMOVD_WORKS
 	xor #8,r0
+#else
+	xor #24,r0
+#endif
+#if defined(__SH4__)
+	swap.w r0,r2
+	rts
+	mov.l r2,@r1
+#else /* defined(__SH3E__) || defined(__SH4_SINGLE*__) */
 	swap.w r0,r3
 	rts
 	mov.l r3,@(4,r1)
+#endif
 	.align 2
 ___set_fpscr_L1:
 	.long ___fpscr_values
@@ -990,5 +1202,5 @@ ___set_fpscr_L1:
 #else
        .comm   ___fpscr_values,8
 #endif /* ELF */
-#endif /* SH3E */
+#endif /* SH3E / SH4 */
 #endif /* L_set_fpscr */
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
--- a/gcc/config/sh/t-sh
+++ b/gcc/config/sh/t-sh
 CROSS_LIBGCC1 = libgcc1-asm.a
 LIB1ASMSRC = sh/lib1funcs.asm
 LIB1ASMFUNCS = _ashiftrt _ashiftrt_n _ashiftlt _lshiftrt _movstr \
-  _mulsi3 _sdivsi3 _udivsi3 _set_fpscr
+  _movstr_i4 _mulsi3 _sdivsi3 _sdivsi3_i4 _udivsi3 _udivsi3_i4 _set_fpscr
 # These are really part of libgcc1, but this will cause them to be
 # built correctly, so...
@@ -21,7 +21,7 @@ fp-bit.c: $(srcdir)/config/fp-bit.c
 	echo '#endif' 		>> fp-bit.c
 	cat $(srcdir)/config/fp-bit.c >> fp-bit.c
-MULTILIB_OPTIONS= ml m2/m3e
+MULTILIB_OPTIONS= ml m2/m3e/m4-single-only/m4-single/m4
 MULTILIB_DIRNAMES= 
 MULTILIB_MATCHES = m2=m3

--- a/gcc/ginclude/va-sh.h
+++ b/gcc/ginclude/va-sh.h
@@ -6,10 +6,10 @@
 #ifndef __GNUC_VA_LIST
 #define __GNUC_VA_LIST
-#ifdef __SH3E__
+#if defined (__SH3E__) || defined (__SH4_SINGLE__) || defined (__SH4__) || defined (__SH4_SINGLE_ONLY__)
 typedef long __va_greg;
-typedef double __va_freg;
+typedef float __va_freg;
 typedef struct {
  __va_greg * __va_next_o;		/* next available register */
@@ -33,24 +33,24 @@ typedef void *__gnuc_va_list;
 #ifdef _STDARG_H
-#ifdef __SH3E__
+#if defined (__SH3E__) || defined (__SH4_SINGLE__) || defined (__SH4__) || defined (__SH4_SINGLE_ONLY__)
 #define va_start(AP, LASTARG) \
 __extension__ \
  ({ \
-     AP.__va_next_fp = (__va_freg *) __builtin_saveregs (); \
+     (AP).__va_next_fp = (__va_freg *) __builtin_saveregs (); \
-     AP.__va_next_fp_limit = (AP.__va_next_fp + \
+     (AP).__va_next_fp_limit = ((AP).__va_next_fp + \
 			      (__builtin_args_info (1) < 8 ? 8 - __builtin_args_info (1) : 0)); \
-     AP.__va_next_o = (__va_greg *) AP.__va_next_fp_limit; \
+     (AP).__va_next_o = (__va_greg *) (AP).__va_next_fp_limit; \
-     AP.__va_next_o_limit = (AP.__va_next_o + \
+     (AP).__va_next_o_limit = ((AP).__va_next_o + \
 			     (__builtin_args_info (0) < 4 ? 4 - __builtin_args_info (0) : 0)); \
-     AP.__va_next_stack = (__va_greg *) __builtin_next_arg (LASTARG); \
+     (AP).__va_next_stack = (__va_greg *) __builtin_next_arg (LASTARG); \
  })
 #else /* ! SH3E */
 #define va_start(AP, LASTARG) 						\
- (AP = ((__gnuc_va_list) __builtin_next_arg (LASTARG)))
+ ((AP) = ((__gnuc_va_list) __builtin_next_arg (LASTARG)))
 #endif /* ! SH3E */
@@ -59,24 +59,26 @@ __extension__ \
 #define va_alist  __builtin_va_alist
 #define va_dcl    int __builtin_va_alist;...
-#ifdef __SH3E__
+#if defined (__SH3E__) || defined (__SH4_SINGLE__) || defined (__SH4__) || defined (__SH4_SINGLE_ONLY__)
 #define va_start(AP) \
 __extension__ \
  ({ \
-     AP.__va_next_fp = (__va_freg *) __builtin_saveregs (); \
+     (AP).__va_next_fp = (__va_freg *) __builtin_saveregs (); \
-     AP.__va_next_fp_limit = (AP.__va_next_fp + \
+     (AP).__va_next_fp_limit = ((AP).__va_next_fp + \
 			      (__builtin_args_info (1) < 8 ? 8 - __builtin_args_info (1) : 0)); \
-     AP.__va_next_o = (__va_greg *) AP.__va_next_fp_limit; \
+     (AP).__va_next_o = (__va_greg *) (AP).__va_next_fp_limit; \
-     AP.__va_next_o_limit = (AP.__va_next_o + \
+     (AP).__va_next_o_limit = ((AP).__va_next_o + \
 			     (__builtin_args_info (0) < 4 ? 4 - __builtin_args_info (0) : 0)); \
-     AP.__va_next_stack = (__va_greg *) __builtin_next_arg (__builtin_va_alist) \
+     (AP).__va_next_stack \
-       - (__builtin_args_info (0) >= 4 || __builtin_args_info (1) >= 8 ? 1 : 0); \
+       = ((__va_greg *) __builtin_next_arg (__builtin_va_alist) \
+	  - (__builtin_args_info (0) >= 4 || __builtin_args_info (1) >= 8 \
+	     ? 1 : 0)); \
  })
 #else /* ! SH3E */
-#define va_start(AP)  AP=(char *) &__builtin_va_alist
+#define va_start(AP)  ((AP) = (char *) &__builtin_va_alist)
 #endif /* ! SH3E */
@@ -136,53 +138,78 @@ enum __va_type_classes {
     We want the MEM_IN_STRUCT_P bit set in the emitted RTL, therefore we
     use unions even when it would otherwise be unnecessary.  */
+/* gcc has an extension that allows to use a casted lvalue as an lvalue,
+   But it doesn't work in C++ with -pedantic - even in the presence of
+   __extension__ .  We work around this problem by using a reference type.  */
+#ifdef __cplusplus
+#define __VA_REF &
+#else
+#define __VA_REF
+#endif
 #define __va_arg_sh1(AP, TYPE) __extension__ 				\
-__extension__								\
 ({(sizeof (TYPE) == 1							\
   ? ({union {TYPE t; char c;} __t;					\
-       asm(""								\
+       __asm(""								\
-	   : "=r" (__t.c)						\
+	     : "=r" (__t.c)						\
-	   : "0" ((((union { int i, j; } *) (AP))++)->i));		\
+	     : "0" ((((union { int i, j; } *__VA_REF) (AP))++)->i));	\
       __t.t;})								\
   : sizeof (TYPE) == 2							\
   ? ({union {TYPE t; short s;} __t;					\
-       asm(""								\
+       __asm(""								\
-	   : "=r" (__t.s)						\
+	     : "=r" (__t.s)						\
-	   : "0" ((((union { int i, j; } *) (AP))++)->i));		\
+	     : "0" ((((union { int i, j; } *__VA_REF) (AP))++)->i));	\
       __t.t;})								\
   : sizeof (TYPE) >= 4 || __LITTLE_ENDIAN_P				\
-   ? (((union { TYPE t; int i;} *) (AP))++)->t				\
+   ? (((union { TYPE t; int i;} *__VA_REF) (AP))++)->t			\
-   : ((union {TYPE t;TYPE u;}*) ((char *)++(int *)(AP) - sizeof (TYPE)))->t);})
+   : ((union {TYPE t;TYPE u;}*) ((char *)++(int *__VA_REF)(AP) - sizeof (TYPE)))->t);})
-#ifdef __SH3E__
+#if defined (__SH3E__) || defined (__SH4_SINGLE__) || defined (__SH4__) || defined (__SH4_SINGLE_ONLY__)
 #define __PASS_AS_FLOAT(TYPE_CLASS,SIZE) \
  (TYPE_CLASS == __real_type_class && SIZE == 4)
+#define __TARGET_SH4_P 0
+#if defined(__SH4__) || defined(__SH4_SINGLE__)
+#undef __PASS_AS_FLOAT
+#define __PASS_AS_FLOAT(TYPE_CLASS,SIZE) \
+  (TYPE_CLASS == __real_type_class && SIZE <= 8 \
+   || TYPE_CLASS == __complex_type_class && SIZE <= 16)
+#undef __TARGET_SH4_P
+#define __TARGET_SH4_P 1
+#endif
 #define va_arg(pvar,TYPE)					\
 __extension__							\
 ({int __type = __builtin_classify_type (* (TYPE *) 0);		\
  void * __result_p;						\
  if (__PASS_AS_FLOAT (__type, sizeof(TYPE)))			\
    {								\
-      if (pvar.__va_next_fp < pvar.__va_next_fp_limit)		\
+      if ((pvar).__va_next_fp < (pvar).__va_next_fp_limit)	\
 	{							\
-	  __result_p = &pvar.__va_next_fp;			\
+	  if (((__type == __real_type_class && sizeof (TYPE) > 4)\
+	       || sizeof (TYPE) > 8)				\
+	      && (((int) (pvar).__va_next_fp ^ (int) (pvar).__va_next_fp_limit)\
+		  & 4))						\
+	    (pvar).__va_next_fp++;				\
+	  __result_p = &(pvar).__va_next_fp;			\
 	}							\
      else							\
-	__result_p = &pvar.__va_next_stack;			\
+	__result_p = &(pvar).__va_next_stack;			\
    }								\
  else								\
    {								\
-      if (pvar.__va_next_o + ((sizeof (TYPE) + 3) / 4)		\
+      if ((pvar).__va_next_o + ((sizeof (TYPE) + 3) / 4)	\
-	  <= pvar.__va_next_o_limit) 				\
+	  <= (pvar).__va_next_o_limit) 				\
-	__result_p = &pvar.__va_next_o;				\
+	__result_p = &(pvar).__va_next_o;			\
      else							\
 	{							\
 	  if (sizeof (TYPE) > 4)				\
-	    pvar.__va_next_o = pvar.__va_next_o_limit;		\
+	   if (! __TARGET_SH4_P)				\
+	    (pvar).__va_next_o = (pvar).__va_next_o_limit;	\
 								\
-	  __result_p = &pvar.__va_next_stack;			\
+	  __result_p = &(pvar).__va_next_stack;			\
 	}							\
    } 								\
  __va_arg_sh1(*(void **)__result_p, TYPE);})
@@ -194,6 +221,6 @@ __extension__							\
 #endif /* SH3E */
 /* Copy __gnuc_va_list into another variable of this type.  */
-#define __va_copy(dest, src) (dest) = (src)
+#define __va_copy(dest, src) ((dest) = (src))
 #endif /* defined (_STDARG_H) || defined (_VARARGS_H) */