backport: re PR target/12476 (ARM/THUMB thunk calls broken)

Merge from csl-arm-branch. 2004-01-30 Paul Brook <paul@codesourcery.com> * aof.h (REGISTER_NAMES): Add vfp reg names (ADDITIONAL_REGISTER_NAMES): Ditto. * aout.h (REGISTER_NAMES): Ditto. (ADDITIONAL_REGISTER_NAMES): Ditto. * arm-protos.h: Update/Add Prototypes. * arm.c (init_fp_table): Rename from init_fpa_table. Update users. Only allow 0.0 for VFP. (fp_consts_inited): Rename from fpa_consts_inited. Update users. (values_fp): Rename from values_fpa. Update Users. (arm_const_double_rtx): Rename from const_double_rtx_ok_for_fpa. Update users. Only check valid constants for this hardware. (arm_float_rhs_operand): Rename from fpa_rhs_operand. Update Users. Only allow consts for FPA. (arm_float_add_operand): Rename from fpa_add_operand. Update users. Only allow consts for FPA. (use_return_insn): Check for saved VFP regs. (arm_legitimate_address_p): Handle VFP DFmode addressing. (arm_legitimize_address): Ditto. (arm_general_register_operand): New function. (vfp_mem_operand): New function. (vfp_compare_operand): New function. (vfp_secondary_reload_class): New function. (arm_float_compare_operand): New function. (vfp_print_multi): New function. (vfp_output_fstmx): New function. (vfp_emit_fstm): New function. (arm_output_epilogue): Output VPF reg restore code. (arm_expand_prologue): Output VFP reg save code. (arm_print_operand): Add 'P'. (arm_hard_regno_mode_ok): Return modes for VFP regs. (arm_regno_class): Return classes for VFP regs. (arm_compute_initial_elimination_offset): Include space for VFP regs. (arm_get_frame_size): Ditto. * arm.h (FIXED_REGISTERS): Add VFP regs. (CALL_USED_REGISTERS): Ditto. (CONDITIONAL_REGISTER_USAGE): Enable VFP regs. (FIRST_VFP_REGNUM): Define. (LAST_VFP_REGNUM): Define. (IS_VFP_REGNUM): Define. (FIRST_PSEUDO_REGISTER): Include VFP regs. (HARD_REGNO_NREGS): Handle VFP regs. (REG_ALLOC_ORDER): Add VFP regs. (enum reg_class): Add VFP_REGS. (REG_CLASS_NAMES): Ditto. (REG_CLASS_CONTENTS): Ditto. (CANNOT_CHANGE_MODE_CLASS) Handle VFP Regs. (REG_CLASS_FROM_LETTER): Add 'w'. (EXTRA_CONSTRAINT_ARM): Add 'U'. (EXTRA_MEMORY_CONSTRAINT): Define. (SECONDARY_OUTPUT_RELOAD_CLASS): Handle VFP regs. (SECONDARY_INPUT_RELOAD_CLASS): Ditto. (REGISTER_MOVE_COST): Ditto. (PREDICATE_CODES): Add arm_general_register_operand, arm_float_compare_operand and vfp_compare_operand. * arm.md (various): Rename as above. (divsf3): Enable when TARGET_VFP. (divdf3): Ditto. (movdfcc): Ditto. (sqrtsf2): Ditto. (sqrtdf2): Ditto. (arm_movdi): Disable when TARGET_VFP. (arm_movsi_insn): Ditto. (movsi): Only split with general regs. (cmpsf): Use arm_float_compare_operand. (push_fp_multi): Restrict to TARGET_FPA. (vfp.md): Include. * vfp.md: New file. * fpa.md (various): Rename as above. * doc/md.texi: Document ARM w and U constraints. 2004-01-15 Paul Brook <paul@codesourcery.com> * config.gcc: Add with_fpu. Allow with-float=softfp. * config/arm/arm.c (arm_override_options): Rename *-s to *s. Break out of loop when we find a float-abi. Fix typo. * config/arm/arm.h (OPTION_DEFAULT_SPECS): Add "fpu". Set -mfloat-abi=. * doc/install.texi: Document --with-fpu. 2003-01-14 Paul Brook <paul@codesourcery.com> * config.gcc (with_arch): Add armv6. * config/arm/arm.h: Rename TARGET_CPU_*_s to TARGET_CPU_*s. * config/arm/arm.c (arm_overrride_options): Ditto. 2004-01-08 Richard Earnshaw <rearnsha@arm.com> * arm.c (FL_ARCH3M): Renamed from FL_FAST_MULT. (FL_ARCH6): Renamed from FL_ARCH6J. (arm_arch3m): Renamed from arm_fast_multiply. (arm_arch6): Renamed from arm_arch6j. * arm.h: Update all uses of above. * arm-cores.def: Likewise. * arm.md: Likewise. * arm.h (CPP_CPU_ARCH_SPEC): Emit __ARM_ARCH_6J__ define for armV6j, not arm6j. Add entry for arch armv6. 2004-01-07 Richard Earnshaw <rearnsha@arm.com> * arm.c (arm_emit_extendsi): Delete. * arm-protos.h (arm_emit_extendsi): Delete. * arm.md (zero_extendhisi2): Also handle zero-extension of non-subregs. (zero_extendqisi2, extendhisi2, extendqisi2): Likewise. (thumb_zero_extendhisi2): Only match if not v6. (arm_zero_extendhisi2, thumb_zero_extendqisi2, arm_zero_extendqisi2) (thumb_extendhisi2, arm_extendhisi2, arm_extendqisi) (thumb_extendqisi2): Likewise. (thumb_zero_extendhisi2_v6, arm_zero_extendhisi2_v6): New patterns. (thumb_zero_extendqisi2_v6, arm_zero_extendqisi2_v6): New patterns. (thumb_extendhisi2_insn_v6, arm_extendhisi2_v6): New patterns. (thumb_extendqisi2_v6, arm_extendqisi_v6): New patterns. (arm_zero_extendhisi2_reg, arm_zero_extendqisi2_reg): Delete. (arm_extendhisi2_reg, arm_extendqisi2_reg): Delete. (arm_zero_extendhisi2addsi): Remove subreg. Add attributes. (arm_zero_extendqisi2addsi, arm_extendhisi2addsi): Likewise. (arm_extendqisi2addsi): Likewise. 2003-12-31 Mark Mitchell <mark@codesourcery.com> Revert this change: * config/arm/arm.h (THUMB_LEGTITIMIZE_RELOAD_ADDRESS): Reload REG + REG addressing modes. * config/arm/arm.h (THUMB_LEGTITIMIZE_RELOAD_ADDRESS): Reload REG + REG addressing modes. 2003-12-30 Mark Mitchell <mark@codesourcery.com> * config/arm/arm.h (THUMB_LEGITIMATE_CONSTANT_P): Accept CONSTANT_P_RTX. 2003-30-12 Paul Brook <paul@codesourcery.com> * longlong.h: protect arm inlines with !defined (__thumb__) 2003-30-12 Paul Brook <paul@codesourcery.com> * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Always define __arm__. 2003-12-30 Nathan Sidwell <nathan@codesourcery.com> * builtins.c (expand_builtin_apply_args_1): Fix typo in previous change. 2003-12-29 Nathan Sidwell <nathan@codesourcery.com> * builtins.c (expand_builtin_apply_args_1): Add pretend args size to the virtual incoming args pointer for downward stacks. 2003-12-29 Paul Brook <paul@codesourcery.com> * config/arm/arm-cores.def: Add cost function. * config/arm/arm.c (arm_*_rtx_costs): New functions. (arm_rtx_costs): Remove (struct processors): Add rtx_costs field. (all_cores, all_architectures): Ditto. (arm_override_options): Set targetm.rtx_costs. (thumb_rtx_costs): New function. (arm_rtx_costs_1): Remove cases handled elsewhere. * config/arm/arm.h (processor_type): Add COSTS parameter. 2003-12-29 Nathan Sidwell <nathan@codesourcery.com> * config/arm/arm.md (generic_sched): arm926 has its own scheduler. (arm926ejs.md): Include it. * config/arm/arm926ejs.md: New pipeline description. 2003-12-24 Paul Brook <paul@codesourcery.com> * config/arm/arm.c (arm_arch6j): New variable. (arm_override_options): Set it. (arm_emit_extendsi): New function. * config/arm/arm-protos.h (arm_emit_extendsi): Add prototype. * config/arm/arm.h (arm_arch6j): Declare. * config/arm/arm.md: Add sign/zero extend insns. 2003-12-23 Paul Brook <paul@codesourcery.com> * config/arm/arm.c (all_architectures): Add armv6. * doc/invoke.texi: Document it. 2003-12-19 Paul Brook <paul@codesourcery.com> * config/arm/arm.md: Add load1 and load_byte "type" attrs. Modify insn patterns to match. * config/arm/arm-generic.md: Ditto. * config/arm/cirrus.md: Ditto. * config/arm/fpa.md: Ditto. * config/amm/iwmmxt.md: Ditto. * config/arm/arm1026ejs.md: Ditto. * config/arm/arm1135jfs.md: Ditto. Add insn_reservation and bypasses for 11_loadb. 2003-12-18 Nathan Sidwell <nathan@codesourcery.com> * config/arm/arm-protos.h (arm_no_early_alu_shift_value_dep): Declare. * config/arm/arm.c (arm_adjust_cost): Check shift cost for TYPE_ALU_SHIFT and TYPE_ALU_SHIFT_REG. (arm_no_early_store_addr_dep, arm_no_early_alu_shift_dep, arm_no_early_mul_dep): Correctly deal with conditional execution, parallels and single shift operations. (arm_no_early_alu_shift_value_dep): Define. * arm.md (attr type): Replace 'normal' with 'alu', 'alu_shift' and 'alu_shift_reg'. (attr core_cycles): Adjust. (*addsi3_carryin_shift, andsi_not_shiftsi_si, *arm_shiftsi3, *shiftsi3_compare0, *notsi_shiftsi, *notsi_shiftsi_compare0, *not_shiftsi_compare0_scratch, *cmpsi_shiftsi, *cmpsi_shiftsi_swp, *cmpsi_neg_shiftsi, *arith_shiftsi, *arith_shiftsi_compare0, *arith_shiftsi_compare0_scratch, *sub_shiftsi, *sub_shiftsi_compare0, *sub_shiftsi_compare0_scratch, *if_shift_move, *if_move_shift, *if_shift_shift): Set type attribute appropriately. * config/arm/arm1026ejs.md (alu_op): Adjust. (alu_shift_op, alu_shift_reg_op): New. * config/arm/arm1136.md: Add better bypasses for early registers. Remove load[234] and store[234] bypasses. (11_alu_op): Adjust. (11_alu_shift_op, 11_alu_shift_reg_op): New. 2003-12-15 Nathan Sidwell <nathan@codesourcery.com> * config/arm/arm-protos.h (arm_no_early_store_addr_dep, arm_no_early_alu_shift_dep, arm_no_early_mul_dep): Declare. * config/arm/arm.c (arm_no_early_store_addr_dep, arm_no_early_alu_shift_dep, arm_no_early_mul_dep): Define. * config/arm/arm1026ejs.md: Add load-store bypass. * config/arm/arm1136jfs.md (11_alu_op): Take 2 cycles. Add bypasses between instructions. 2003-12-10 Paul Brook <paul@codesourcery.com> * config/arm/arm.c (arm_fpu_model): New variable. (arm_fload_abi): New variable. (target_fpe_name): Rename from target_fp_name. (target_fpu_name): New variable. (arm_is_cirrus): Remove. (fpu_desc): New struct. (all_fpus): Define. (pf_model_for_fpu): Define. (all_loat_abis): Define. (arm_override_options): Set fp arch flags based on -mfpu= and -float-abi=. (FIRST_FPA_REGNUM): Rename from FIRST_ARM_FP_REGNUM. (LAST_FPA_REGNUM): Rename from LAST_ARM_FP_REGNUM. (*): Use new TARGET_* flags. * config/arm/arm.h (TARGET_ANY_HARD_FLOAT): Remove. (TARGET_HARD_FLOAT): No longer implies TARGET_FPA. (TARGET_SOFT_FLOAT): Ditto. (TARGET_SOFT_FLOAT_ABI): New. (TARGET_MAVERICK): Rename from TARGET_CIRRUS. No longer implies TARGET_HARD_FLOAT. (TARGET_VFP): No longer implies TARGET_HARD_FLOAT. (TARGET_OPTIONS): Add -mfpu=. (FIRST_FPA_REGNUM): Rename from FIRST_ARM_FP_REGNUM. (LAST_FPA_REGNUM): Rename from LAST_ARM_FP_REGNUM. (arm_pf_model): Define. (arm_float_abi_type): Define. (fputype): Add FPUTYPE_VFP. Change SOFT_FPA->NONE * config/arm/arm.md: Use new TARGET_* flags. * config/arm/cirrus.md: Ditto. * config/arm/fpa.md: Ditto. * config/arm/elf.h (ASM_SPEC): Pass -mfloat-abi= and -mfpu=. * config/arm/semi.h (ASM_SPEC): Ditto. * config/arm/netbsd-elf.h (SUBTARGET_ASM_FLOAT_SPEC): Specify vfp. (FPUTYPE_DEFAULT): Set to VFP. * doc/invoke.texi: Document -mfpu= and -mfloat-abi=. 2003-11-22 Phil Edwards <phil@codesourcery.com> PR target/12476 * config/arm/arm.c (arm_output_mi_thunk): In Thumb mode, use 'bx' instead of 'b' to avoid branch range restrictions. Output the thunk immediately before the thunked-to function. * config/arm/arm.h (ARM_DECLARE_FUNCTION_NAME): Do not emit .thumb_func if a thunk is being generated. Emit .code 16 along with .thumb_func if a thunk is not being generated. 2003-11-15 Nicolas Pitre <nico@cam.org> * config/arm/arm.md (ashldi3, arm_ashldi3_1bit, ashrdi3, arm_ashrdi3_1bit, lshrdi3, arm_lshrdi3_1bit): New patterns. * config/arm/iwmmxt.md (ashrdi3_iwmmxt): Renamed from ashrdi3. (lshrdi3_iwmmxt): Renamed from lshrdi3. * config/arm/arm.c (IWMMXT_BUILTIN2): Renamed argument accordingly. 2003-11-12 Steve Woodford <scw@wasabisystems.com> Ian Lance Taylor <ian@wasabisystems.com> * config/arm/lib1funcs.asm (ARM_DIV_BODY, ARM_MOD_BODY): Add new code for __ARM_ARCH__ >= 5 && ! defined (__OPTIMIZE_SIZE__). 2003-11-05 Phil Edwards <phil@codesourcery.com> * config/arm/arm.md (insn): Add new V6 instruction names. (generic_sched): New attr. * config/arm/arm-generic.md: Use generic_sched here. * config/arm/arm1026ejs.md: Do not model fetch/issue/decode stages of pipeline. Adjust latency counts accordingly. * config/arm/arm1136jfs.md: New file. 2003-10-28 Mark Mitchell <mark@codesourcery.com> * config/arm/arm.h (processor_type): New enumeration type. (CPP_ARCH_DEFAULT_SPEC): Set appropriately for ARM 926EJ-S, ARM1026EJ-S, ARM1136J-S, and ARM1136JF-S processor cores. (CPP_CPU_ARCH_SPEC): Likewise. * config/arm/arm.c (arm_tune): New variable. (all_cores): Use cores.def. (all_architectures): Add representative processor. (arm_override_options): Restructure way in which tuning information is deduced. * arm.md: Update "insn" and "type" attributes throughout. (insn): New attribute. (type): Compute "mult" from "insn" attribute. Add load2, load3, load4 alternatives. (arm automaton): Move to arm-generic.md. * config/arm/arm-cores.def: New file. * config/arm/arm-generic.md: Likewise. * config/arm/arm1026ejs.md: Likewise. From-SVN: r77171

backport: re PR target/12476 (ARM/THUMB thunk calls broken)
Merge from csl-arm-branch. 2004-01-30 Paul Brook <paul@codesourcery.com> * aof.h (REGISTER_NAMES): Add vfp reg names (ADDITIONAL_REGISTER_NAMES): Ditto. * aout.h (REGISTER_NAMES): Ditto. (ADDITIONAL_REGISTER_NAMES): Ditto. * arm-protos.h: Update/Add Prototypes. * arm.c (init_fp_table): Rename from init_fpa_table. Update users. Only allow 0.0 for VFP. (fp_consts_inited): Rename from fpa_consts_inited. Update users. (values_fp): Rename from values_fpa. Update Users. (arm_const_double_rtx): Rename from const_double_rtx_ok_for_fpa. Update users. Only check valid constants for this hardware. (arm_float_rhs_operand): Rename from fpa_rhs_operand. Update Users. Only allow consts for FPA. (arm_float_add_operand): Rename from fpa_add_operand. Update users. Only allow consts for FPA. (use_return_insn): Check for saved VFP regs. (arm_legitimate_address_p): Handle VFP DFmode addressing. (arm_legitimize_address): Ditto. (arm_general_register_operand): New function. (vfp_mem_operand): New function. (vfp_compare_operand): New function. (vfp_secondary_reload_class): New function. (arm_float_compare_operand): New function. (vfp_print_multi): New function. (vfp_output_fstmx): New function. (vfp_emit_fstm): New function. (arm_output_epilogue): Output VPF reg restore code. (arm_expand_prologue): Output VFP reg save code. (arm_print_operand): Add 'P'. (arm_hard_regno_mode_ok): Return modes for VFP regs. (arm_regno_class): Return classes for VFP regs. (arm_compute_initial_elimination_offset): Include space for VFP regs. (arm_get_frame_size): Ditto. * arm.h (FIXED_REGISTERS): Add VFP regs. (CALL_USED_REGISTERS): Ditto. (CONDITIONAL_REGISTER_USAGE): Enable VFP regs. (FIRST_VFP_REGNUM): Define. (LAST_VFP_REGNUM): Define. (IS_VFP_REGNUM): Define. (FIRST_PSEUDO_REGISTER): Include VFP regs. (HARD_REGNO_NREGS): Handle VFP regs. (REG_ALLOC_ORDER): Add VFP regs. (enum reg_class): Add VFP_REGS. (REG_CLASS_NAMES): Ditto. (REG_CLASS_CONTENTS): Ditto. (CANNOT_CHANGE_MODE_CLASS) Handle VFP Regs. (REG_CLASS_FROM_LETTER): Add 'w'. (EXTRA_CONSTRAINT_ARM): Add 'U'. (EXTRA_MEMORY_CONSTRAINT): Define. (SECONDARY_OUTPUT_RELOAD_CLASS): Handle VFP regs. (SECONDARY_INPUT_RELOAD_CLASS): Ditto. (REGISTER_MOVE_COST): Ditto. (PREDICATE_CODES): Add arm_general_register_operand, arm_float_compare_operand and vfp_compare_operand. * arm.md (various): Rename as above. (divsf3): Enable when TARGET_VFP. (divdf3): Ditto. (movdfcc): Ditto. (sqrtsf2): Ditto. (sqrtdf2): Ditto. (arm_movdi): Disable when TARGET_VFP. (arm_movsi_insn): Ditto. (movsi): Only split with general regs. (cmpsf): Use arm_float_compare_operand. (push_fp_multi): Restrict to TARGET_FPA. (vfp.md): Include. * vfp.md: New file. * fpa.md (various): Rename as above. * doc/md.texi: Document ARM w and U constraints. 2004-01-15 Paul Brook <paul@codesourcery.com> * config.gcc: Add with_fpu. Allow with-float=softfp. * config/arm/arm.c (arm_override_options): Rename *-s to *s. Break out of loop when we find a float-abi. Fix typo. * config/arm/arm.h (OPTION_DEFAULT_SPECS): Add "fpu". Set -mfloat-abi=. * doc/install.texi: Document --with-fpu. 2003-01-14 Paul Brook <paul@codesourcery.com> * config.gcc (with_arch): Add armv6. * config/arm/arm.h: Rename TARGET_CPU_*_s to TARGET_CPU_*s. * config/arm/arm.c (arm_overrride_options): Ditto. 2004-01-08 Richard Earnshaw <rearnsha@arm.com> * arm.c (FL_ARCH3M): Renamed from FL_FAST_MULT. (FL_ARCH6): Renamed from FL_ARCH6J. (arm_arch3m): Renamed from arm_fast_multiply. (arm_arch6): Renamed from arm_arch6j. * arm.h: Update all uses of above. * arm-cores.def: Likewise. * arm.md: Likewise. * arm.h (CPP_CPU_ARCH_SPEC): Emit __ARM_ARCH_6J__ define for armV6j, not arm6j. Add entry for arch armv6. 2004-01-07 Richard Earnshaw <rearnsha@arm.com> * arm.c (arm_emit_extendsi): Delete. * arm-protos.h (arm_emit_extendsi): Delete. * arm.md (zero_extendhisi2): Also handle zero-extension of non-subregs. (zero_extendqisi2, extendhisi2, extendqisi2): Likewise. (thumb_zero_extendhisi2): Only match if not v6. (arm_zero_extendhisi2, thumb_zero_extendqisi2, arm_zero_extendqisi2) (thumb_extendhisi2, arm_extendhisi2, arm_extendqisi) (thumb_extendqisi2): Likewise. (thumb_zero_extendhisi2_v6, arm_zero_extendhisi2_v6): New patterns. (thumb_zero_extendqisi2_v6, arm_zero_extendqisi2_v6): New patterns. (thumb_extendhisi2_insn_v6, arm_extendhisi2_v6): New patterns. (thumb_extendqisi2_v6, arm_extendqisi_v6): New patterns. (arm_zero_extendhisi2_reg, arm_zero_extendqisi2_reg): Delete. (arm_extendhisi2_reg, arm_extendqisi2_reg): Delete. (arm_zero_extendhisi2addsi): Remove subreg. Add attributes. (arm_zero_extendqisi2addsi, arm_extendhisi2addsi): Likewise. (arm_extendqisi2addsi): Likewise. 2003-12-31 Mark Mitchell <mark@codesourcery.com> Revert this change: * config/arm/arm.h (THUMB_LEGTITIMIZE_RELOAD_ADDRESS): Reload REG + REG addressing modes. * config/arm/arm.h (THUMB_LEGTITIMIZE_RELOAD_ADDRESS): Reload REG + REG addressing modes. 2003-12-30 Mark Mitchell <mark@codesourcery.com> * config/arm/arm.h (THUMB_LEGITIMATE_CONSTANT_P): Accept CONSTANT_P_RTX. 2003-30-12 Paul Brook <paul@codesourcery.com> * longlong.h: protect arm inlines with !defined (__thumb__) 2003-30-12 Paul Brook <paul@codesourcery.com> * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Always define __arm__. 2003-12-30 Nathan Sidwell <nathan@codesourcery.com> * builtins.c (expand_builtin_apply_args_1): Fix typo in previous change. 2003-12-29 Nathan Sidwell <nathan@codesourcery.com> * builtins.c (expand_builtin_apply_args_1): Add pretend args size to the virtual incoming args pointer for downward stacks. 2003-12-29 Paul Brook <paul@codesourcery.com> * config/arm/arm-cores.def: Add cost function. * config/arm/arm.c (arm_*_rtx_costs): New functions. (arm_rtx_costs): Remove (struct processors): Add rtx_costs field. (all_cores, all_architectures): Ditto. (arm_override_options): Set targetm.rtx_costs. (thumb_rtx_costs): New function. (arm_rtx_costs_1): Remove cases handled elsewhere. * config/arm/arm.h (processor_type): Add COSTS parameter. 2003-12-29 Nathan Sidwell <nathan@codesourcery.com> * config/arm/arm.md (generic_sched): arm926 has its own scheduler. (arm926ejs.md): Include it. * config/arm/arm926ejs.md: New pipeline description. 2003-12-24 Paul Brook <paul@codesourcery.com> * config/arm/arm.c (arm_arch6j): New variable. (arm_override_options): Set it. (arm_emit_extendsi): New function. * config/arm/arm-protos.h (arm_emit_extendsi): Add prototype. * config/arm/arm.h (arm_arch6j): Declare. * config/arm/arm.md: Add sign/zero extend insns. 2003-12-23 Paul Brook <paul@codesourcery.com> * config/arm/arm.c (all_architectures): Add armv6. * doc/invoke.texi: Document it. 2003-12-19 Paul Brook <paul@codesourcery.com> * config/arm/arm.md: Add load1 and load_byte "type" attrs. Modify insn patterns to match. * config/arm/arm-generic.md: Ditto. * config/arm/cirrus.md: Ditto. * config/arm/fpa.md: Ditto. * config/amm/iwmmxt.md: Ditto. * config/arm/arm1026ejs.md: Ditto. * config/arm/arm1135jfs.md: Ditto. Add insn_reservation and bypasses for 11_loadb. 2003-12-18 Nathan Sidwell <nathan@codesourcery.com> * config/arm/arm-protos.h (arm_no_early_alu_shift_value_dep): Declare. * config/arm/arm.c (arm_adjust_cost): Check shift cost for TYPE_ALU_SHIFT and TYPE_ALU_SHIFT_REG. (arm_no_early_store_addr_dep, arm_no_early_alu_shift_dep, arm_no_early_mul_dep): Correctly deal with conditional execution, parallels and single shift operations. (arm_no_early_alu_shift_value_dep): Define. * arm.md (attr type): Replace 'normal' with 'alu', 'alu_shift' and 'alu_shift_reg'. (attr core_cycles): Adjust. (*addsi3_carryin_shift, andsi_not_shiftsi_si, *arm_shiftsi3, *shiftsi3_compare0, *notsi_shiftsi, *notsi_shiftsi_compare0, *not_shiftsi_compare0_scratch, *cmpsi_shiftsi, *cmpsi_shiftsi_swp, *cmpsi_neg_shiftsi, *arith_shiftsi, *arith_shiftsi_compare0, *arith_shiftsi_compare0_scratch, *sub_shiftsi, *sub_shiftsi_compare0, *sub_shiftsi_compare0_scratch, *if_shift_move, *if_move_shift, *if_shift_shift): Set type attribute appropriately. * config/arm/arm1026ejs.md (alu_op): Adjust. (alu_shift_op, alu_shift_reg_op): New. * config/arm/arm1136.md: Add better bypasses for early registers. Remove load[234] and store[234] bypasses. (11_alu_op): Adjust. (11_alu_shift_op, 11_alu_shift_reg_op): New. 2003-12-15 Nathan Sidwell <nathan@codesourcery.com> * config/arm/arm-protos.h (arm_no_early_store_addr_dep, arm_no_early_alu_shift_dep, arm_no_early_mul_dep): Declare. * config/arm/arm.c (arm_no_early_store_addr_dep, arm_no_early_alu_shift_dep, arm_no_early_mul_dep): Define. * config/arm/arm1026ejs.md: Add load-store bypass. * config/arm/arm1136jfs.md (11_alu_op): Take 2 cycles. Add bypasses between instructions. 2003-12-10 Paul Brook <paul@codesourcery.com> * config/arm/arm.c (arm_fpu_model): New variable. (arm_fload_abi): New variable. (target_fpe_name): Rename from target_fp_name. (target_fpu_name): New variable. (arm_is_cirrus): Remove. (fpu_desc): New struct. (all_fpus): Define. (pf_model_for_fpu): Define. (all_loat_abis): Define. (arm_override_options): Set fp arch flags based on -mfpu= and -float-abi=. (FIRST_FPA_REGNUM): Rename from FIRST_ARM_FP_REGNUM. (LAST_FPA_REGNUM): Rename from LAST_ARM_FP_REGNUM. (*): Use new TARGET_* flags. * config/arm/arm.h (TARGET_ANY_HARD_FLOAT): Remove. (TARGET_HARD_FLOAT): No longer implies TARGET_FPA. (TARGET_SOFT_FLOAT): Ditto. (TARGET_SOFT_FLOAT_ABI): New. (TARGET_MAVERICK): Rename from TARGET_CIRRUS. No longer implies TARGET_HARD_FLOAT. (TARGET_VFP): No longer implies TARGET_HARD_FLOAT. (TARGET_OPTIONS): Add -mfpu=. (FIRST_FPA_REGNUM): Rename from FIRST_ARM_FP_REGNUM. (LAST_FPA_REGNUM): Rename from LAST_ARM_FP_REGNUM. (arm_pf_model): Define. (arm_float_abi_type): Define. (fputype): Add FPUTYPE_VFP. Change SOFT_FPA->NONE * config/arm/arm.md: Use new TARGET_* flags. * config/arm/cirrus.md: Ditto. * config/arm/fpa.md: Ditto. * config/arm/elf.h (ASM_SPEC): Pass -mfloat-abi= and -mfpu=. * config/arm/semi.h (ASM_SPEC): Ditto. * config/arm/netbsd-elf.h (SUBTARGET_ASM_FLOAT_SPEC): Specify vfp. (FPUTYPE_DEFAULT): Set to VFP. * doc/invoke.texi: Document -mfpu= and -mfloat-abi=. 2003-11-22 Phil Edwards <phil@codesourcery.com> PR target/12476 * config/arm/arm.c (arm_output_mi_thunk): In Thumb mode, use 'bx' instead of 'b' to avoid branch range restrictions. Output the thunk immediately before the thunked-to function. * config/arm/arm.h (ARM_DECLARE_FUNCTION_NAME): Do not emit .thumb_func if a thunk is being generated. Emit .code 16 along with .thumb_func if a thunk is not being generated. 2003-11-15 Nicolas Pitre <nico@cam.org> * config/arm/arm.md (ashldi3, arm_ashldi3_1bit, ashrdi3, arm_ashrdi3_1bit, lshrdi3, arm_lshrdi3_1bit): New patterns. * config/arm/iwmmxt.md (ashrdi3_iwmmxt): Renamed from ashrdi3. (lshrdi3_iwmmxt): Renamed from lshrdi3. * config/arm/arm.c (IWMMXT_BUILTIN2): Renamed argument accordingly. 2003-11-12 Steve Woodford <scw@wasabisystems.com> Ian Lance Taylor <ian@wasabisystems.com> * config/arm/lib1funcs.asm (ARM_DIV_BODY, ARM_MOD_BODY): Add new code for __ARM_ARCH__ >= 5 && ! defined (__OPTIMIZE_SIZE__). 2003-11-05 Phil Edwards <phil@codesourcery.com> * config/arm/arm.md (insn): Add new V6 instruction names. (generic_sched): New attr. * config/arm/arm-generic.md: Use generic_sched here. * config/arm/arm1026ejs.md: Do not model fetch/issue/decode stages of pipeline. Adjust latency counts accordingly. * config/arm/arm1136jfs.md: New file. 2003-10-28 Mark Mitchell <mark@codesourcery.com> * config/arm/arm.h (processor_type): New enumeration type. (CPP_ARCH_DEFAULT_SPEC): Set appropriately for ARM 926EJ-S, ARM1026EJ-S, ARM1136J-S, and ARM1136JF-S processor cores. (CPP_CPU_ARCH_SPEC): Likewise. * config/arm/arm.c (arm_tune): New variable. (all_cores): Use cores.def. (all_architectures): Add representative processor. (arm_override_options): Restructure way in which tuning information is deduced. * arm.md: Update "insn" and "type" attributes throughout. (insn): New attribute. (type): Compute "mult" from "insn" attribute. Add load2, load3, load4 alternatives. (arm automaton): Move to arm-generic.md. * config/arm/arm-cores.def: New file. * config/arm/arm-generic.md: Likewise. * config/arm/arm1026ejs.md: Likewise. From-SVN: r77171
9b66ebb1 · Paul Brook · Paul Brook · e93f124e · 9b66ebb1 · 9b66ebb1
Commit 9b66ebb1 authored Feb 03, 2004 by Paul Brook Committed by Paul Brook Feb 03, 2004
26 changed files
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2399,7 +2399,7 @@ fi
 		;;
 	arm*-*-*)
-		supported_defaults="arch cpu float tune"
+		supported_defaults="arch cpu float tune fpu"
 		for which in cpu tune; do
 			eval "val=\$with_$which"
 			case "$val" in
@@ -2426,7 +2426,7 @@ fi
 		case "$with_arch" in
 		"" \
-		| armv[2345] | armv2a | armv3m | armv4t | armv5t \
+		| armv[23456] | armv2a | armv3m | armv4t | armv5t \
 		| armv5te | armv6j | ep9312)
 			# OK
 			;;
@@ -2438,7 +2438,7 @@ fi
 		case "$with_float" in
 		"" \
-		| soft | hard)
+		| soft | hard | softfp)
 			# OK
 			;;
 		*)
@@ -2447,6 +2447,17 @@ fi
 			;;
 		esac
+		case "$with_fpu" in
+		"" \
+		| fpa | fpe2 | fpe3 | maverick | vfp )
+			# OK
+			;;
+		*)
+			echo "Unknown fpu used in --with-fpu=$fpu" 2>&1
+			exit 1
+			;;
+		esac
 		if test "x$with_arch" != x && test "x$with_cpu" != x; then
 			echo "Warning: --with-arch overrides --with-cpu" 1>&2
 		fi
@@ -2737,7 +2748,7 @@ fi
 	esac
 	t=
-	all_defaults="abi cpu arch tune schedule float mode"
+	all_defaults="abi cpu arch tune schedule float mode fpu"
 	for option in $all_defaults
 	do
 		eval "val=\$with_$option"

--- a/gcc/config/arm/aof.h
+++ b/gcc/config/arm/aof.h
@@ -246,7 +246,12 @@ do {					\
  "wr0",   "wr1",   "wr2",   "wr3",		\
  "wr4",   "wr5",   "wr6",   "wr7",		\
  "wr8",   "wr9",   "wr10",  "wr11",		\
-  "wr12",  "wr13",  "wr14",  "wr15"		\
+  "wr12",  "wr13",  "wr14",  "wr15",		\
+  "s0",  "s1",  "s2",  "s3",  "s4",  "s5",  "s6",  "s7",  \
+  "s8",  "s9",  "s10", "s11", "s12", "s13", "s14", "s15", \
+  "s16", "s17", "s18", "s19", "s20", "s21", "s22", "s23", \
+  "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31",  \
+  "vfpcc"
 }
 #define ADDITIONAL_REGISTER_NAMES		\
@@ -267,6 +272,22 @@ do {					\
  {"r13", 13}, {"sp", 13}, 			\
  {"r14", 14}, {"lr", 14},			\
  {"r15", 15}, {"pc", 15}			\
+  {"d0", 63},					\
+  {"d1", 65},					\
+  {"d2", 67},					\
+  {"d3", 69},					\
+  {"d4", 71},					\
+  {"d5", 73},					\
+  {"d6", 75},					\
+  {"d7", 77},					\
+  {"d8", 79},					\
+  {"d9", 81},					\
+  {"d10", 83},					\
+  {"d11", 85},					\
+  {"d12", 87},					\
+  {"d13", 89},					\
+  {"d14", 91},					\
+  {"d15", 93},					\
 }
 #define REGISTER_PREFIX "__"

--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -49,7 +49,7 @@
 /* The assembler's names for the registers.  */
 #ifndef REGISTER_NAMES
-#define REGISTER_NAMES  			   \
+#define REGISTER_NAMES				   \
 {				                   \
  "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",  \
  "r8", "r9", "sl", "fp", "ip", "sp", "lr", "pc",  \
@@ -63,7 +63,12 @@
  "wr0",   "wr1",   "wr2",   "wr3",		   \
  "wr4",   "wr5",   "wr6",   "wr7",		   \
  "wr8",   "wr9",   "wr10",  "wr11",		   \
-  "wr12",  "wr13",  "wr14",  "wr15"		   \
+  "wr12",  "wr13",  "wr14",  "wr15",		   \
+  "s0",  "s1",  "s2",  "s3",  "s4",  "s5",  "s6",  "s7",  \
+  "s8",  "s9",  "s10", "s11", "s12", "s13", "s14", "s15", \
+  "s16", "s17", "s18", "s19", "s20", "s21", "s22", "s23", \
+  "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31", \
+  "vfpcc"					   \
 }
 #endif
@@ -152,7 +157,23 @@
  {"mvdx12", 39},				\
  {"mvdx13", 40},				\
  {"mvdx14", 41},				\
-  {"mvdx15", 42}				\
+  {"mvdx15", 42},				\
+  {"d0", 63},					\
+  {"d1", 65},					\
+  {"d2", 67},					\
+  {"d3", 69},					\
+  {"d4", 71},					\
+  {"d5", 73},					\
+  {"d6", 75},					\
+  {"d7", 77},					\
+  {"d8", 79},					\
+  {"d9", 81},					\
+  {"d10", 83},					\
+  {"d11", 85},					\
+  {"d12", 87},					\
+  {"d13", 89},					\
+  {"d14", 91},					\
+  {"d15", 93},					\
 }
 #endif

--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
+/* ARM CPU Cores
+   Copyright (C) 2003 Free Software Foundation, Inc.
+   Written by CodeSourcery, LLC
+   This file is part of GCC.
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the Free
+   Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+   02111-1307, USA.  */
+/* Before using #include to read this file, define a macro:
+      ARM_CORE(CORE_NAME, FLAGS)
+   The CORE_NAME is the name of the core, represented as an identifier
+   rather than a string constant.  The FLAGS are the bitwise-or of the
+   traits that apply to that core.
+   If you update this table, you must update the "tune" attribue in
+   arm.md.  */
+ARM_CORE(arm2,		FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE(arm250,	FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE(arm3,		FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE(arm6,		FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm60,		FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm600,	FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm610,	             FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm620,	FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm7,		FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+/* arm7m doesn't exist on its own, but only with D, (and I), but
+   those don't alter the code, so arm7m is sometimes used.  */
+ARM_CORE(arm7m,		FL_CO_PROC | FL_MODE26 | FL_MODE32 | FL_ARCH3M, fastmul)
+ARM_CORE(arm7d,		FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm7dm,	FL_CO_PROC | FL_MODE26 | FL_MODE32 | FL_ARCH3M, fastmul)
+ARM_CORE(arm7di,	FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm7dmi,	FL_CO_PROC | FL_MODE26 | FL_MODE32 | FL_ARCH3M, fastmul)
+ARM_CORE(arm70,		FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm700,	FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm700i,	FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm710,	             FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm720,	             FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm710c,	             FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm7100,	             FL_MODE26 | FL_MODE32, slowmul)
+ARM_CORE(arm7500,	             FL_MODE26 | FL_MODE32, slowmul)
+/* Doesn't have an external co-proc, but does have embedded fpa.  */
+ARM_CORE(arm7500fe,	FL_CO_PROC | FL_MODE26 | FL_MODE32, slowmul)
+/* V4 Architecture Processors */
+ARM_CORE(arm7tdmi,	FL_CO_PROC |             FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB, fastmul)
+ARM_CORE(arm710t,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB, fastmul)
+ARM_CORE(arm720t,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB, fastmul)
+ARM_CORE(arm740t,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB, fastmul)
+ARM_CORE(arm8,	                     FL_MODE26 | FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED, fastmul)
+ARM_CORE(arm810,	             FL_MODE26 | FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED, fastmul)
+ARM_CORE(arm9,	                                 FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED, fastmul)
+ARM_CORE(arm920,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED, fastmul)
+ARM_CORE(arm920t,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED, fastmul)
+ARM_CORE(arm940t,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED, fastmul)
+ARM_CORE(arm9tdmi,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED, fastmul)
+ARM_CORE(arm9e,	       	      		         FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED, 9e)
+ARM_CORE(ep9312,	   			 FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED |             FL_CIRRUS, fastmul)
+ARM_CORE(strongarm,	             FL_MODE26 | FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED | FL_STRONG, fastmul)
+ARM_CORE(strongarm110,               FL_MODE26 | FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED | FL_STRONG, fastmul)
+ARM_CORE(strongarm1100,              FL_MODE26 | FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED | FL_STRONG, fastmul)
+ARM_CORE(strongarm1110,              FL_MODE26 | FL_MODE32 | FL_ARCH3M | FL_ARCH4 |            FL_LDSCHED | FL_STRONG, fastmul)
+/* V5 Architecture Processors */
+ARM_CORE(arm10tdmi,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED             | FL_ARCH5, fastmul)
+ARM_CORE(arm1020t,	                         FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED             | FL_ARCH5, fastmul)
+ARM_CORE(arm926ejs,                              FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB                          | FL_ARCH5 | FL_ARCH5E, 9e)
+ARM_CORE(arm1026ejs,                             FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB                          | FL_ARCH5 | FL_ARCH5E, 9e)
+ARM_CORE(xscale,                                 FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED | FL_STRONG | FL_ARCH5 | FL_ARCH5E | FL_XSCALE, xscale)
+ARM_CORE(iwmmxt,                                 FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB | FL_LDSCHED | FL_STRONG | FL_ARCH5 | FL_ARCH5E | FL_XSCALE | FL_IWMMXT, xscale)
+/* V6 Architecture Processors */
+ARM_CORE(arm1136js,                              FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB                          | FL_ARCH5 | FL_ARCH5E | FL_ARCH6, 9e)
+ARM_CORE(arm1136jfs,                             FL_MODE32 | FL_ARCH3M | FL_ARCH4 | FL_THUMB                          | FL_ARCH5 | FL_ARCH5E | FL_ARCH6 | FL_VFPV2, 9e)
--- a/gcc/config/arm/arm-generic.md
+++ b/gcc/config/arm/arm-generic.md
+;; Generic ARM Pipeline Description
+;; Copyright (C) 2003 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING.  If not, write to the Free
+;; Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+;; 02111-1307, USA.  */
+(define_automaton "arm")
+;; Write buffer
+;
+; Strictly, we should model a 4-deep write buffer for ARM7xx based chips
+;
+; The write buffer on some of the arm6 processors is hard to model exactly.
+; There is room in the buffer for up to two addresses and up to eight words
+; of memory, but the two needn't be split evenly.  When writing the two
+; addresses are fully pipelined.  However, a read from memory that is not
+; currently in the cache will block until the writes have completed.
+; It is normally the case that FCLK and MCLK will be in the ratio 2:1, so
+; writes will take 2 FCLK cycles per word, if FCLK and MCLK are asynchronous
+; (they aren't allowed to be at present) then there is a startup cost of 1MCLK
+; cycle to add as well.
+(define_cpu_unit "write_buf" "arm")
+;; Write blockage unit
+;
+; The write_blockage unit models (partially), the fact that reads will stall
+; until the write buffer empties.
+; The f_mem_r and r_mem_f could also block, but they are to the stack,
+; so we don't model them here
+(define_cpu_unit "write_blockage" "arm")
+;; Core
+;
+(define_cpu_unit "core" "arm")
+(define_insn_reservation "r_mem_f_wbuf" 5
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "yes")
+	    (eq_attr "type" "r_mem_f")))
+  "core+write_buf*3")
+(define_insn_reservation "store_wbuf" 5
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "yes")
+       	    (eq_attr "type" "store1")))
+  "core+write_buf*3+write_blockage*5")
+(define_insn_reservation "store2_wbuf" 7
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "yes")
+	    (eq_attr "type" "store2")))
+  "core+write_buf*4+write_blockage*7")
+(define_insn_reservation "store3_wbuf" 9
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "yes")
+	    (eq_attr "type" "store3")))
+  "core+write_buf*5+write_blockage*9")
+(define_insn_reservation "store4_wbuf" 11
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "yes")
+            (eq_attr "type" "store4")))
+  "core+write_buf*6+write_blockage*11")
+(define_insn_reservation "store2" 3
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "no")
+            (eq_attr "type" "store2")))
+  "core*3")
+(define_insn_reservation "store3" 4
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "no")
+            (eq_attr "type" "store3")))
+  "core*4")
+(define_insn_reservation "store4" 5
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "model_wbuf" "no")
+	    (eq_attr "type" "store4")))
+  "core*5")
+(define_insn_reservation "store_ldsched" 1
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "ldsched" "yes") 
+	    (eq_attr "type" "store1")))
+  "core")
+(define_insn_reservation "load_ldsched_xscale" 3
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "ldsched" "yes") 
+	    (and (eq_attr "type" "load_byte,load1")
+	         (eq_attr "is_xscale" "yes"))))
+  "core")
+(define_insn_reservation "load_ldsched" 2
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "ldsched" "yes") 
+	    (and (eq_attr "type" "load_byte,load1")
+	         (eq_attr "is_xscale" "no"))))
+  "core")
+(define_insn_reservation "load_or_store" 2
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "ldsched" "!yes") 
+	    (eq_attr "type" "load_byte,load1,load2,load3,load4,store1")))
+  "core*2")
+(define_insn_reservation "mult" 16
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "ldsched" "no") (eq_attr "type" "mult")))
+  "core*16")
+(define_insn_reservation "mult_ldsched_strongarm" 3
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "ldsched" "yes") 
+	    (and (eq_attr "is_strongarm" "yes")
+	         (eq_attr "type" "mult"))))
+  "core*2")
+(define_insn_reservation "mult_ldsched" 4
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "ldsched" "yes") 
+	    (and (eq_attr "is_strongarm" "no")
+	         (eq_attr "type" "mult"))))
+  "core*4")
+(define_insn_reservation "multi_cycle" 32
+  (and (eq_attr "generic_sched" "yes")
+       (and (eq_attr "core_cycles" "multi")
+            (eq_attr "type" "!mult,load_byte,load1,load2,load3,load4,store1,store2,store3,store4")))
+  "core*32")
+(define_insn_reservation "single_cycle" 1
+  (and (eq_attr "generic_sched" "yes")
+       (eq_attr "core_cycles" "single"))
+  "core")
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
 /* Prototypes for exported functions defined in arm.c and pe.c
-   Copyright (C) 1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004
+   Free Software Foundation, Inc.
   Contributed by Richard Earnshaw (rearnsha@arm.com)
   Minor hacks by Nick Clifton (nickc@cygnus.com)
@@ -53,12 +54,14 @@ extern int arm_legitimate_address_p  (enum machine_mode, rtx, int);
 extern int thumb_legitimate_address_p (enum machine_mode, rtx, int);
 extern int thumb_legitimate_offset_p (enum machine_mode, HOST_WIDE_INT);
 extern rtx arm_legitimize_address (rtx, rtx, enum machine_mode);
-extern int const_double_rtx_ok_for_fpa (rtx);
+extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
+extern enum reg_class vfp_secondary_reload_class (enum machine_mode, rtx);
 /* Predicates.  */
 extern int s_register_operand (rtx, enum machine_mode);
 extern int arm_hard_register_operand (rtx, enum machine_mode);
+extern int arm_general_register_operand (rtx, enum machine_mode);
 extern int f_register_operand (rtx, enum machine_mode);
 extern int reg_or_int_operand (rtx, enum machine_mode);
 extern int arm_reload_memory_operand (rtx, enum machine_mode);
@@ -70,8 +73,8 @@ extern int arm_not_operand (rtx, enum machine_mode);
 extern int offsettable_memory_operand (rtx, enum machine_mode);
 extern int alignable_memory_operand (rtx, enum machine_mode);
 extern int bad_signed_byte_operand (rtx, enum machine_mode);
-extern int fpa_rhs_operand (rtx, enum machine_mode);
+extern int arm_float_rhs_operand (rtx, enum machine_mode);
-extern int fpa_add_operand (rtx, enum machine_mode);
+extern int arm_float_add_operand (rtx, enum machine_mode);
 extern int power_of_two_operand (rtx, enum machine_mode);
 extern int nonimmediate_di_operand (rtx, enum machine_mode);
 extern int di_operand (rtx, enum machine_mode);
@@ -95,6 +98,13 @@ extern int cirrus_general_operand (rtx, enum machine_mode);
 extern int cirrus_register_operand (rtx, enum machine_mode);
 extern int cirrus_shift_const (rtx, enum machine_mode);
 extern int cirrus_memory_offset (rtx);
+extern int vfp_mem_operand (rtx);
+extern int vfp_compare_operand (rtx, enum machine_mode);
+extern int arm_float_compare_operand (rtx, enum machine_mode);
+extern int arm_no_early_store_addr_dep (rtx, rtx);
+extern int arm_no_early_alu_shift_dep (rtx, rtx);
+extern int arm_no_early_alu_shift_value_dep (rtx, rtx);
+extern int arm_no_early_mul_dep (rtx, rtx);
 extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
@@ -138,6 +148,7 @@ extern int arm_debugger_arg_offset (int, rtx);
 extern int arm_is_longcall_p (rtx, int, int);
 extern int    arm_emit_vector_const (FILE *, rtx);
 extern const char * arm_output_load_gr (rtx *);
+extern const char *vfp_output_fstmx (rtx *);
 #if defined TREE_CODE
 extern rtx arm_function_arg (CUMULATIVE_ARGS *, enum machine_mode, tree, int);

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
--- a/gcc/config/arm/arm1026ejs.md
+++ b/gcc/config/arm/arm1026ejs.md
+;; ARM 1026EJ-S Pipeline Description
+;; Copyright (C) 2003 Free Software Foundation, Inc.
+;; Written by CodeSourcery, LLC.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING.  If not, write to the Free
+;; Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+;; 02111-1307, USA.  */
+;; These descriptions are based on the information contained in the
+;; ARM1026EJ-S Technical Reference Manual, Copyright (c) 2003 ARM
+;; Limited.
+;;
+;; This automaton provides a pipeline description for the ARM
+;; 1026EJ-S core.
+;;
+;; The model given here assumes that the condition for all conditional
+;; instructions is "true", i.e., that all of the instructions are
+;; actually executed.
+(define_automaton "arm1026ejs")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Pipelines
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; There are two pipelines:
+;; 
+;; - An Arithmetic Logic Unit (ALU) pipeline.
+;;
+;;   The ALU pipeline has fetch, issue, decode, execute, memory, and
+;;   write stages. We only need to model the execute, memory and write
+;;   stages.
+;;
+;; - A Load-Store Unit (LSU) pipeline.
+;;
+;;   The LSU pipeline has decode, execute, memory, and write stages.
+;;   We only model the execute, memory and write stages.
+(define_cpu_unit "a_e,a_m,a_w" "arm1026ejs")
+(define_cpu_unit "l_e,l_m,l_w" "arm1026ejs")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; ALU Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; ALU instructions require three cycles to execute, and use the ALU
+;; pipeline in each of the three stages.  The results are available
+;; after the execute stage stage has finished.
+;;
+;; If the destination register is the PC, the pipelines are stalled
+;; for several cycles.  That case is not modeled here.
+;; ALU operations with no shifted operand
+(define_insn_reservation "alu_op" 1 
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "alu"))
+ "a_e,a_m,a_w")
+;; ALU operations with a shift-by-constant operand
+(define_insn_reservation "alu_shift_op" 1 
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "alu_shift"))
+ "a_e,a_m,a_w")
+;; ALU operations with a shift-by-register operand
+;; These really stall in the decoder, in order to read
+;; the shift value in a second cycle. Pretend we take two cycles in
+;; the execute stage.
+(define_insn_reservation "alu_shift_reg_op" 2 
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "alu_shift_reg"))
+ "a_e*2,a_m,a_w")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Multiplication Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Multiplication instructions loop in the execute stage until the
+;; instruction has been passed through the multiplier array enough
+;; times.
+;; The result of the "smul" and "smulw" instructions is not available
+;; until after the memory stage.
+(define_insn_reservation "mult1" 2
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "insn" "smulxy,smulwy"))
+ "a_e,a_m,a_w")
+;; The "smlaxy" and "smlawx" instructions require two iterations through
+;; the execute stage; the result is available immediately following
+;; the execute stage.
+(define_insn_reservation "mult2" 2
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "insn" "smlaxy,smlalxy,smlawx"))
+ "a_e*2,a_m,a_w")
+;; The "smlalxy", "mul", and "mla" instructions require two iterations
+;; through the execute stage; the result is not available until after
+;; the memory stage.
+(define_insn_reservation "mult3" 3
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "insn" "smlalxy,mul,mla"))
+ "a_e*2,a_m,a_w")
+;; The "muls" and "mlas" instructions loop in the execute stage for
+;; four iterations in order to set the flags.  The value result is
+;; available after three iterations.
+(define_insn_reservation "mult4" 3
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "insn" "muls,mlas"))
+ "a_e*4,a_m,a_w")
+;; Long multiply instructions that produce two registers of
+;; output (such as umull) make their results available in two cycles;
+;; the least significant word is available before the most significant
+;; word.  That fact is not modeled; instead, the instructions are
+;; described.as if the entire result was available at the end of the
+;; cycle in which both words are available.
+;; The "umull", "umlal", "smull", and "smlal" instructions all take
+;; three iterations through the execute cycle, and make their results
+;; available after the memory cycle.
+(define_insn_reservation "mult5" 4
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "insn" "umull,umlal,smull,smlal"))
+ "a_e*3,a_m,a_w")
+;; The "umulls", "umlals", "smulls", and "smlals" instructions loop in
+;; the execute stage for five iterations in order to set the flags.
+;; The value result is vailable after four iterations.
+(define_insn_reservation "mult6" 4
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "insn" "umulls,umlals,smulls,smlals"))
+ "a_e*5,a_m,a_w")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Load/Store Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; The models for load/store instructions do not accurately describe
+;; the difference between operations with a base register writeback
+;; (such as "ldm!").  These models assume that all memory references
+;; hit in dcache.
+;; LSU instructions require six cycles to execute.  They use the ALU
+;; pipeline in all but the 5th cycle, and the LSU pipeline in cycles
+;; three through six.
+;; Loads and stores which use a scaled register offset or scaled
+;; register pre-indexed addressing mode take three cycles EXCEPT for
+;; those that are base + offset with LSL of 0 or 2, or base - offset
+;; with LSL of zero.  The remainder take 1 cycle to execute.
+;; For 4byte loads there is a bypass from the load stage
+(define_insn_reservation "load1_op" 2
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "load_byte,load1"))
+ "a_e+l_e,l_m,a_w+l_w")
+(define_insn_reservation "store1_op" 0
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "store1"))
+ "a_e+l_e,l_m,a_w+l_w")
+;; A load's result can be stored by an immediately following store
+(define_bypass 1 "load1_op" "store1_op" "arm_no_early_store_addr_dep")
+;; On a LDM/STM operation, the LSU pipeline iterates until all of the
+;; registers have been processed.
+;;
+;; The time it takes to load the data depends on whether or not the
+;; base address is 64-bit aligned; if it is not, an additional cycle
+;; is required.  This model assumes that the address is always 64-bit
+;; aligned.  Because the processor can load two registers per cycle,
+;; that assumption means that we use the same instruction rservations
+;; for loading 2k and 2k - 1 registers.
+;;
+;; The ALU pipeline is stalled until the completion of the last memory
+;; stage in the LSU pipeline.  That is modeled by keeping the ALU
+;; execute stage busy until that point.
+;;
+;; As with ALU operations, if one of the destination registers is the
+;; PC, there are additional stalls; that is not modeled.
+(define_insn_reservation "load2_op" 2
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "load2"))
+ "a_e+l_e,l_m,a_w+l_w")
+(define_insn_reservation "store2_op" 0
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "store2"))
+ "a_e+l_e,l_m,a_w+l_w")
+(define_insn_reservation "load34_op" 3
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "load3,load4"))
+ "a_e+l_e,a_e+l_e+l_m,a_e+l_m,a_w+l_w")
+(define_insn_reservation "store34_op" 0
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "store3,store4"))
+ "a_e+l_e,a_e+l_e+l_m,a_e+l_m,a_w+l_w")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Branch and Call Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Branch instructions are difficult to model accurately.  The ARM
+;; core can predict most branches.  If the branch is predicted
+;; correctly, and predicted early enough, the branch can be completely
+;; eliminated from the instruction stream.  Some branches can
+;; therefore appear to require zero cycles to execute.  We assume that
+;; all branches are predicted correctly, and that the latency is
+;; therefore the minimum value.
+(define_insn_reservation "branch_op" 0
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "branch"))
+ "nothing")
+;; The latency for a call is not predictable.  Therefore, we use 32 as
+;; roughly equivalent to postive infinity.
+(define_insn_reservation "call_op" 32
+ (and (eq_attr "tune" "arm1026ejs")
+      (eq_attr "type" "call"))
+ "nothing")
--- a/gcc/config/arm/arm1136jfs.md
+++ b/gcc/config/arm/arm1136jfs.md
--- a/gcc/config/arm/arm926ejs.md
+++ b/gcc/config/arm/arm926ejs.md
+;; ARM 926EJ-S Pipeline Description
+;; Copyright (C) 2003 Free Software Foundation, Inc.
+;; Written by CodeSourcery, LLC.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING.  If not, write to the Free
+;; Software Foundation, 59 Temple Place - Suite 330, Boston, MA
+;; 02111-1307, USA.  */
+;; These descriptions are based on the information contained in the
+;; ARM926EJ-S Technical Reference Manual, Copyright (c) 2002 ARM
+;; Limited.
+;;
+;; This automaton provides a pipeline description for the ARM
+;; 926EJ-S core.
+;;
+;; The model given here assumes that the condition for all conditional
+;; instructions is "true", i.e., that all of the instructions are
+;; actually executed.
+(define_automaton "arm926ejs")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Pipelines
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; There is a single pipeline
+;;
+;;   The ALU pipeline has fetch, decode, execute, memory, and
+;;   write stages. We only need to model the execute, memory and write
+;;   stages.
+(define_cpu_unit "e,m,w" "arm926ejs")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; ALU Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; ALU instructions require three cycles to execute, and use the ALU
+;; pipeline in each of the three stages.  The results are available
+;; after the execute stage stage has finished.
+;;
+;; If the destination register is the PC, the pipelines are stalled
+;; for several cycles.  That case is not modeled here.
+;; ALU operations with no shifted operand
+(define_insn_reservation "9_alu_op" 1 
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "alu,alu_shift"))
+ "e,m,w")
+;; ALU operations with a shift-by-register operand
+;; These really stall in the decoder, in order to read
+;; the shift value in a second cycle. Pretend we take two cycles in
+;; the execute stage.
+(define_insn_reservation "9_alu_shift_reg_op" 2 
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "alu_shift_reg"))
+ "e*2,m,w")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Multiplication Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Multiplication instructions loop in the execute stage until the
+;; instruction has been passed through the multiplier array enough
+;; times. Multiply operations occur in both the execute and memory
+;; stages of the pipeline
+(define_insn_reservation "9_mult1" 3
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "insn" "smlalxy,mul,mla"))
+ "e*2,m,w")
+(define_insn_reservation "9_mult2" 4
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "insn" "muls,mlas"))
+ "e*3,m,w")
+(define_insn_reservation "9_mult3" 4
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "insn" "umull,umlal,smull,smlal"))
+ "e*3,m,w")
+(define_insn_reservation "9_mult4" 5
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "insn" "umulls,umlals,smulls,smlals"))
+ "e*4,m,w")
+(define_insn_reservation "9_mult5" 2
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "insn" "smulxy,smlaxy,smlawx"))
+ "e,m,w")
+(define_insn_reservation "9_mult6" 3
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "insn" "smlalxy"))
+ "e*2,m,w")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Load/Store Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; The models for load/store instructions do not accurately describe
+;; the difference between operations with a base register writeback
+;; (such as "ldm!").  These models assume that all memory references
+;; hit in dcache.
+;; Loads with a shifted offset take 3 cycles, and are (a) probably the
+;; most common and (b) the pessimistic assumption will lead to fewer stalls.
+(define_insn_reservation "9_load1_op" 3
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "load1,load_byte"))
+ "e*2,m,w")
+(define_insn_reservation "9_store1_op" 0
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "store1"))
+ "e,m,w")
+;; multiple word loads and stores
+(define_insn_reservation "9_load2_op" 3
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "load2"))
+ "e,m*2,w")
+(define_insn_reservation "9_load3_op" 4
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "load3"))
+ "e,m*3,w")
+(define_insn_reservation "9_load4_op" 5
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "load4"))
+ "e,m*4,w")
+(define_insn_reservation "9_store2_op" 0
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "store2"))
+ "e,m*2,w")
+(define_insn_reservation "9_store3_op" 0
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "store3"))
+ "e,m*3,w")
+(define_insn_reservation "9_store4_op" 0
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "store4"))
+ "e,m*4,w")
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Branch and Call Instructions
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Branch instructions are difficult to model accurately.  The ARM
+;; core can predict most branches.  If the branch is predicted
+;; correctly, and predicted early enough, the branch can be completely
+;; eliminated from the instruction stream.  Some branches can
+;; therefore appear to require zero cycles to execute.  We assume that
+;; all branches are predicted correctly, and that the latency is
+;; therefore the minimum value.
+(define_insn_reservation "9_branch_op" 0
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "branch"))
+ "nothing")
+;; The latency for a call is not predictable.  Therefore, we use 32 as
+;; roughly equivalent to postive infinity.
+(define_insn_reservation "9_call_op" 32
+ (and (eq_attr "tune" "arm926ejs")
+      (eq_attr "type" "call"))
+ "nothing")
--- a/gcc/config/arm/cirrus.md
+++ b/gcc/config/arm/cirrus.md
--- a/gcc/config/arm/elf.h
+++ b/gcc/config/arm/elf.h
@@ -46,7 +46,7 @@
 #ifndef SUBTARGET_ASM_FLOAT_SPEC
 #define SUBTARGET_ASM_FLOAT_SPEC "\
-%{mapcs-float:-mfloat} %{msoft-float:-mfpu=softfpa}"
+%{mapcs-float:-mfloat}"
 #endif
 #ifndef ASM_SPEC
@@ -58,6 +58,8 @@
 %{mapcs-*:-mapcs-%*} \
 %(subtarget_asm_float_spec) \
 %{mthumb-interwork:-mthumb-interwork} \
+%{msoft-float:-mfloat-abi=soft} %{mhard-float:-mfloat-abi=hard} \
+%{mfloat-abi=*} %{mfpu=*} \
 %(subtarget_extra_asm_spec)"
 #endif

--- a/gcc/config/arm/fpa.md
+++ b/gcc/config/arm/fpa.md
--- a/gcc/config/arm/iwmmxt.md
+++ b/gcc/config/arm/iwmmxt.md
@@ -86,7 +86,7 @@
    }
 }"
  [(set_attr "length"         "8,8,8,4,4,4,4,4")
-   (set_attr "type"           "*,load,store2,*,*,*,*,*")
+   (set_attr "type"           "*,load1,store2,*,*,*,*,*")
   (set_attr "pool_range"     "*,1020,*,*,*,*,*,*")
   (set_attr "neg_pool_range" "*,1012,*,*,*,*,*,*")]
 )
@@ -110,7 +110,7 @@
   case 7: return \"wstrw\\t%1, %0\";
   default:return \"wstrw\\t%1, [sp, #-4]!\;wldrw\\t%0, [sp], #4\\t@move CG reg\";
  }"
-  [(set_attr "type"           "*,*,load,store1,*,*,load,store1,*")
+  [(set_attr "type"           "*,*,load1,store1,*,*,load1,store1,*")
   (set_attr "length"         "*,*,*,        *,*,*,  16,     *,8")
   (set_attr "pool_range"     "*,*,4096,     *,*,*,1024,     *,*")
   (set_attr "neg_pool_range" "*,*,4084,     *,*,*,   *,  1012,*")
@@ -148,7 +148,7 @@
   case 4: return \"tmcr%?\\t%0, %1\";
   default: return \"tmrc%?\\t%0, %1\";
  }"
-  [(set_attr "type"           "*,*,load,store1,*,*")
+  [(set_attr "type"           "*,*,load1,store1,*,*")
   (set_attr "pool_range"     "*,*,4096,     *,*,*")
   (set_attr "neg_pool_range" "*,*,4084,     *,*,*")]
 )
@@ -169,7 +169,7 @@
   }"
  [(set_attr "predicable" "yes")
   (set_attr "length"         "4,     4,   4,4,4,   8")
-   (set_attr "type"           "*,store1,load,*,*,load")
+   (set_attr "type"           "*,store1,load1,*,*,load1")
   (set_attr "pool_range"     "*,     *, 256,*,*, 256")
   (set_attr "neg_pool_range" "*,     *, 244,*,*, 244")])
@@ -189,7 +189,7 @@
   }"
  [(set_attr "predicable" "yes")
   (set_attr "length"         "4,     4,   4,4,4,   8")
-   (set_attr "type"           "*,store1,load,*,*,load")
+   (set_attr "type"           "*,store1,load1,*,*,load1")
   (set_attr "pool_range"     "*,     *, 256,*,*, 256")
   (set_attr "neg_pool_range" "*,     *, 244,*,*, 244")])
@@ -209,7 +209,7 @@
   }"
  [(set_attr "predicable" "yes")
   (set_attr "length"         "4,     4,   4,4,4,  24")
-   (set_attr "type"           "*,store1,load,*,*,load")
+   (set_attr "type"           "*,store1,load1,*,*,load1")
   (set_attr "pool_range"     "*,     *, 256,*,*, 256")
   (set_attr "neg_pool_range" "*,     *, 244,*,*, 244")])
@@ -225,7 +225,7 @@
  "* return output_move_double (operands);"
  [(set_attr "predicable"     "yes")
   (set_attr "length"         "8")
-   (set_attr "type"           "load")
+   (set_attr "type"           "load1")
   (set_attr "pool_range"     "256")
   (set_attr "neg_pool_range" "244")])
@@ -1149,7 +1149,7 @@
  "wsrawg%?\\t%0, %1, %2"
  [(set_attr "predicable" "yes")])
-(define_insn "ashrdi3"
+(define_insn "ashrdi3_iwmmxt"
  [(set (match_operand:DI              0 "register_operand" "=y")
 	(ashiftrt:DI (match_operand:DI 1 "register_operand" "y")
 		   (match_operand:SI   2 "register_operand" "z")))]
@@ -1173,7 +1173,7 @@
  "wsrlwg%?\\t%0, %1, %2"
  [(set_attr "predicable" "yes")])
-(define_insn "lshrdi3"
+(define_insn "lshrdi3_iwmmxt"
  [(set (match_operand:DI              0 "register_operand" "=y")
 	(lshiftrt:DI (match_operand:DI 1 "register_operand" "y")
 		     (match_operand:SI 2 "register_operand" "z")))]

--- a/gcc/config/arm/lib1funcs.asm
+++ b/gcc/config/arm/lib1funcs.asm
@@ -243,6 +243,25 @@ pc		.req	r15
 /* ------------------------------------------------------------------------ */	
 .macro ARM_DIV_BODY dividend, divisor, result, curbit
+#if __ARM_ARCH__ >= 5 && ! defined (__OPTIMIZE_SIZE__)
+	clz	\curbit, \dividend
+	clz	\result, \divisor
+	sub	\curbit, \result, \curbit
+	rsbs	\curbit, \curbit, #31
+	addne	\curbit, \curbit, \curbit, lsl #1
+	mov	\result, #0
+	addne	pc, pc, \curbit, lsl #2
+	nop
+	.set	shift, 32
+	.rept	32
+	.set	shift, shift - 1
+	cmp	\dividend, \divisor, lsl #shift
+	adc	\result, \result, \result
+	subcs	\dividend, \dividend, \divisor, lsl #shift
+	.endr
+#else /* __ARM_ARCH__ < 5 || defined (__OPTIMIZE_SIZE__) */
 #if __ARM_ARCH__ >= 5
 	clz	\curbit, \divisor
@@ -253,7 +272,7 @@ pc		.req	r15
 	mov	\curbit, \curbit, lsl \result
 	mov	\result, #0
-#else
+#else /* __ARM_ARCH__ < 5 */
 	@ Initially shift the divisor left 3 bits if possible,
 	@ set curbit accordingly.  This allows for curbit to be located
@@ -284,7 +303,7 @@ pc		.req	r15
 	mov	\result, #0
-#endif
+#endif /* __ARM_ARCH__ < 5 */
 	@ Division loop
 1:	cmp	\dividend, \divisor
@@ -304,6 +323,8 @@ pc		.req	r15
 	movne	\divisor,  \divisor, lsr #4
 	bne	1b
+#endif /* __ARM_ARCH__ < 5 || defined (__OPTIMIZE_SIZE__) */
 .endm
 /* ------------------------------------------------------------------------ */	
 .macro ARM_DIV2_ORDER divisor, order
@@ -338,6 +359,22 @@ pc		.req	r15
 /* ------------------------------------------------------------------------ */
 .macro ARM_MOD_BODY dividend, divisor, order, spare
+#if __ARM_ARCH__ >= 5 && ! defined (__OPTIMIZE_SIZE__)
+	clz	\order, \divisor
+	clz	\spare, \dividend
+	sub	\order, \order, \spare
+	rsbs	\order, \order, #31
+	addne	pc, pc, \order, lsl #3
+	nop
+	.set	shift, 32
+	.rept	32
+	.set	shift, shift - 1
+	cmp	\dividend, \divisor, lsl #shift
+	subcs	\dividend, \dividend, \divisor, lsl #shift
+	.endr
+#else /* __ARM_ARCH__ < 5 || defined (__OPTIMIZE_SIZE__) */
 #if __ARM_ARCH__ >= 5
 	clz	\order, \divisor
@@ -345,7 +382,7 @@ pc		.req	r15
 	sub	\order, \order, \spare
 	mov	\divisor, \divisor, lsl \order
-#else
+#else /* __ARM_ARCH__ < 5 */
 	mov	\order, #0
@@ -367,7 +404,7 @@ pc		.req	r15
 	addlo	\order, \order, #1
 	blo	1b
-#endif
+#endif /* __ARM_ARCH__ < 5 */
 	@ Perform all needed substractions to keep only the reminder.
 	@ Do comparisons in batch of 4 first.
@@ -404,6 +441,9 @@ pc		.req	r15
 4:	cmp	\dividend, \divisor
 	subhs	\dividend, \dividend, \divisor
 5:
+#endif /* __ARM_ARCH__ < 5 || defined (__OPTIMIZE_SIZE__) */
 .endm
 /* ------------------------------------------------------------------------ */
 .macro THUMB_DIV_MOD_BODY modulo

--- a/gcc/config/arm/linux-elf.h
+++ b/gcc/config/arm/linux-elf.h
@@ -55,7 +55,7 @@
   %{shared:-lc} \
   %{!shared:%{profile:-lc_p}%{!profile:-lc}}"
-#define LIBGCC_SPEC "%{msoft-float:-lfloat} -lgcc"
+#define LIBGCC_SPEC "%{msoft-float:-lfloat} %{mfloat-abi=soft*:-lfloat} -lgcc"
 /* Provide a STARTFILE_SPEC appropriate for GNU/Linux.  Here we add
   the GNU/Linux magical crtbegin.o file (see crtstuff.c) which

--- a/gcc/config/arm/netbsd-elf.h
+++ b/gcc/config/arm/netbsd-elf.h
@@ -57,14 +57,11 @@
 #define SUBTARGET_EXTRA_ASM_SPEC	\
  "-matpcs %{fpic|fpie:-k} %{fPIC|fPIE:-k}"
-/* Default floating point model is soft-VFP.
+/* Default to full VFP if -mhard-float is specified.  */
-   FIXME: -mhard-float currently implies FPA.  */
 #undef SUBTARGET_ASM_FLOAT_SPEC
 #define SUBTARGET_ASM_FLOAT_SPEC	\
-  "%{mhard-float:-mfpu=fpa} \
+  "%{mhard-float:{!mfpu=*:-mfpu=vfp}}   \
-   %{msoft-float:-mfpu=softvfp} \
+   %{mfloat-abi=hard:{!mfpu=*:-mfpu=vfp}}"
-   %{!mhard-float: \
-     %{!msoft-float:-mfpu=softvfp}}"
 #undef SUBTARGET_EXTRA_SPECS
 #define SUBTARGET_EXTRA_SPECS				\
@@ -171,3 +168,7 @@ do									\
    (void) sysarch (0, &s);						\
  }									\
 while (0)
+#undef FPUTYPE_DEFAULT
+#define FPUTYPE_DEFAULT FPUTYPE_VFP
--- a/gcc/config/arm/semi.h
+++ b/gcc/config/arm/semi.h
@@ -64,7 +64,8 @@
 %{mcpu=*:-mcpu=%*} \
 %{march=*:-march=%*} \
 %{mapcs-float:-mfloat} \
-%{msoft-float:-mfpu=softfpa} \
+%{msoft-float:-mfloat-abi=soft} %{mhard-float:mfloat-abi=hard} \
+%{mfloat-abi=*} %{mfpu=*} \
 %{mthumb-interwork:-mthumb-interwork} \
 %(subtarget_extra_asm_spec)"
 #endif

--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -926,12 +926,13 @@ and SPARC@.
 @itemx --with-arch=@var{cpu}
 @itemx --with-tune=@var{cpu}
 @itemx --with-abi=@var{abi}
+@itemx --with-fpu=@var{type}
 @itemx --with-float=@var{type}
 These configure options provide default values for the @option{-mschedule=},
-@option{-march=}, @option{-mtune=}, and @option{-mabi=} options and for
+@option{-march=}, @option{-mtune=}, @option{-mabi=}, and @option{-mfpu=}
-@option{-mhard-float} or @option{-msoft-float}.  As with @option{--with-cpu},
+options and for @option{-mhard-float} or @option{-msoft-float}.  As with
-which switches will be accepted and acceptable values of the arguments depend
+@option{--with-cpu}, which switches will be accepted and acceptable values
-on the target.
+of the arguments depend on the target.
 @item --enable-altivec
 Specify that the target supports AltiVec vector enhancements.  This

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -375,9 +375,9 @@ in the following sections.
 -msched-prolog  -mno-sched-prolog @gol
 -mlittle-endian  -mbig-endian  -mwords-little-endian @gol
 -malignment-traps  -mno-alignment-traps @gol
-msoft-float  -mhard-float  -mfpe @gol
+-mfloat-abi=@var{name}  soft-float  -mhard-float  -mfpe @gol
 -mthumb-interwork  -mno-thumb-interwork @gol
-mcpu=@var{name}  -march=@var{name}  -mfpe=@var{name}  @gol
+-mcpu=@var{name}  -march=@var{name}  -mfpu=@var{name}  @gol
 -mstructure-size-boundary=@var{n} @gol
 -mabort-on-noreturn @gol
 -mlong-calls  -mno-long-calls @gol
@@ -6497,6 +6497,16 @@ this option.  In particular, you need to compile @file{libgcc.a}, the
 library that comes with GCC, with @option{-msoft-float} in order for
 this to work.
+@item -mfloat-abi=@var{name}
+@opindex mfloat-abi
+Specifies which ABI to use for floating point values.  Permissible values
+are: @samp{soft}, @samp{softfp} and @samp{hard}.
+@samp{soft} and @samp{hard} are equivalent to @option{-msoft-float}
+and @option{-mhard-float} respectively.  @samp{softfp} allows the generation
+of floating point instructions, but still uses the soft-float calling
+conventions.
 @item -mlittle-endian
 @opindex mlittle-endian
 Generate code for a processor running in little-endian mode.  This is
@@ -6595,16 +6605,23 @@ name to determine what kind of instructions it can emit when generating
 assembly code.  This option can be used in conjunction with or instead
 of the @option{-mcpu=} option.  Permissible names are: @samp{armv2},
 @samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t},
-@samp{armv5}, @samp{armv5t}, @samp{armv5te}, @samp{armv6j},
+@samp{armv5}, @samp{armv5t}, @samp{armv5te}, @samp{armv6}, @samp{armv6j},
 @samp{iwmmxt}, @samp{ep9312}.
-@item -mfpe=@var{number}
+@item -mfpu=@var{name}
+@itemx -mfpe=@var{number}
 @itemx -mfp=@var{number}
+@opindex mfpu
 @opindex mfpe
 @opindex mfp
-This specifies the version of the floating point emulation available on
+This specifies what floating point hardware (or hardware emulation) is
-the target.  Permissible values are 2 and 3.  @option{-mfp=} is a synonym
+available on the target.  Permissible names are: @samp{fpa}, @samp{fpe2},
-for @option{-mfpe=}, for compatibility with older versions of GCC@.
+@samp{fpe3}, @samp{maverick}, @samp{vfp}.  @option{-mfp} and @option{-mfpe}
+are synonyms for @option{-mpfu}=@samp{fpe}@var{number}, for compatibility
+with older versions of GCC@.
+If @option{-msoft-float} is specified this specifies the format of
+floating point values.
 @item -mstructure-size-boundary=@var{n}
 @opindex mstructure-size-boundary

--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1340,6 +1340,9 @@ available on some particular machines.
 @item f
 Floating-point register
+@item w
+VFP floating-point register
 @item F
 One of the floating-point constants 0.0, 0.5, 1.0, 2.0, 3.0, 4.0, 5.0
 or 10.0
@@ -1376,6 +1379,9 @@ An item in the constant pool
 A symbol in the text segment of the current file
 @end table
+@item U
+A memory reference suitable for VFP load/store insns (reg+constant offset)
 @item AVR family---@file{avr.h}
 @table @code
 @item l

--- a/gcc/longlong.h
+++ b/gcc/longlong.h
@@ -186,7 +186,7 @@ do {									\
 UDItype __umulsidi3 (USItype, USItype);
 #endif
-#if defined (__arm__) && W_TYPE_SIZE == 32
+#if defined (__arm__) && !defined (__thumb__) && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
  __asm__ ("adds	%1, %4, %5\n\tadc	%0, %2, %3"		\
 	   : "=r" ((USItype) (sh)),					\