Commit dc884a86 by Richard Sandiford Committed by Richard Sandiford

invoke.texi: Document -mvr4130-align.

	* doc/invoke.texi: Document -mvr4130-align.
	* config/mips/mips.h (MASK_VR4130_ALIGN, TARGET_VR4130_ALIGN)
	(TUNE_MIPS4120, TUNE_MIPS4130): New macros.
	(TUNE_MACC_CHAINS): Include TUNE_MIPS4120 and TUNE_MIPS4130.
	(TARGET_SWITCHES): Add -mvr4130-align and -mno-vr4130-align.
	* config/mips/mips.md: Include sched-int.h.
	(USEFUL_INSN_P, SEQ_BEGIN, SEQ_END, FOR_EACH_SUBINSN): New macros.
	(mips_rtx_costs): Set integer multiplication costs for TUNE_MIPS4130.
	(override_options): Enable -mvr4130-align at -O3 and above.
	(mips_sim_insn): New variable.
	(mips_sim): New structure.
	(mips_sim_reset, mips_sim_init, mips_sim_next_cycle, mips_sim_wait_reg)
	(mips_sim_wait_regs_2, mips_sim_wait_regs_1, mips_sim_wait_regs)
	(mips_sim_wait_units, mips_sim_wait_insn, mips_sim_record_set)
	(mips_sim_issue_insn, mips_sim_issue_nop, mips_sim_finish_insn)
	(vr4130_avoid_branch_rt_conflict, vr4130_align_insns): New functions.
	(mips_reorg): Call vr4130_align_insns.
	(vr4130_last_insn): New variable.
	(vr4130_true_reg_dependence_p_1, vr4130_true_reg_dependence_p)
	(vr4130_swap_insns_p, vr4130_reorder): New functions.
	(mips_sched_reorder, mips_variable_issue): Hook in vr4130 code.
	(mips_issue_rate): Return 2 for PROCESSOR_R4130.
	(mips_use_dfa_pipeline_interface): Return true for the same.
	* config/mips/4130.md: New file.
	* config/mips/mips.md: Include it.  Add a peephole2 to convert
	"mult;mflo" into "mtlo;macc".
	(*macc, *umul_acc_di, *smul_acc_di): Use $1 rather than $0 as the
	target of maccs.
	(*msac_using_macc): New pattern.

From-SVN: r81567
parent e51f7aeb
2004-05-06 Richard Sandiford <rsandifo@redhat.com>
* doc/invoke.texi: Document -mvr4130-align.
* config/mips/mips.h (MASK_VR4130_ALIGN, TARGET_VR4130_ALIGN)
(TUNE_MIPS4120, TUNE_MIPS4130): New macros.
(TUNE_MACC_CHAINS): Include TUNE_MIPS4120 and TUNE_MIPS4130.
(TARGET_SWITCHES): Add -mvr4130-align and -mno-vr4130-align.
* config/mips/mips.md: Include sched-int.h.
(USEFUL_INSN_P, SEQ_BEGIN, SEQ_END, FOR_EACH_SUBINSN): New macros.
(mips_rtx_costs): Set integer multiplication costs for TUNE_MIPS4130.
(override_options): Enable -mvr4130-align at -O3 and above.
(mips_sim_insn): New variable.
(mips_sim): New structure.
(mips_sim_reset, mips_sim_init, mips_sim_next_cycle, mips_sim_wait_reg)
(mips_sim_wait_regs_2, mips_sim_wait_regs_1, mips_sim_wait_regs)
(mips_sim_wait_units, mips_sim_wait_insn, mips_sim_record_set)
(mips_sim_issue_insn, mips_sim_issue_nop, mips_sim_finish_insn)
(vr4130_avoid_branch_rt_conflict, vr4130_align_insns): New functions.
(mips_reorg): Call vr4130_align_insns.
(vr4130_last_insn): New variable.
(vr4130_true_reg_dependence_p_1, vr4130_true_reg_dependence_p)
(vr4130_swap_insns_p, vr4130_reorder): New functions.
(mips_sched_reorder, mips_variable_issue): Hook in vr4130 code.
(mips_issue_rate): Return 2 for PROCESSOR_R4130.
(mips_use_dfa_pipeline_interface): Return true for the same.
* config/mips/4130.md: New file.
* config/mips/mips.md: Include it. Add a peephole2 to convert
"mult;mflo" into "mtlo;macc".
(*macc, *umul_acc_di, *smul_acc_di): Use $1 rather than $0 as the
target of maccs.
(*msac_using_macc): New pattern.
2004-05-06 Richard Sandiford <rsandifo@redhat.com>
* config/mips/5500.md (ir_vr55_store): Set latency to 0.
(ir_vr55_hilo): Split into...
(ir_vr55_mfhilo, ir_vr55_mthilo): ...these new reservations.
......
;;
;; Pipeline description for the VR4130 family.
;;
;; The processor issues each 8-byte aligned pair of instructions together,
;; stalling the second instruction if it depends on the first. Thus, if we
;; want two instructions to issue in parallel, we need to make sure that the
;; first one is 8-byte aligned.
;;
;; For the purposes of this pipeline description, we treat the processor
;; like a standard two-way superscalar architecture. If scheduling were
;; the last pass to run, we could use the scheduler hooks to vary the
;; issue rate depending on whether an instruction is at an aligned or
;; unaligned address. Unfortunately, delayed branch scheduling and
;; hazard avoidance are done after the final scheduling pass, and they
;; can change the addresses of many instructions.
;;
;; We get around this in two ways:
;;
;; (1) By running an extra pass at the end of compilation. This pass goes
;; through the function looking for pairs of instructions that could
;; execute in parallel. It makes sure that the first instruction in
;; each pair is suitably aligned, inserting nops if necessary. Doing
;; this gives the same kind of pipeline behavior we would see on a
;; normal superscalar target.
;;
;; This pass is generally a speed improvement, but the extra nops will
;; obviously make the program bigger. It is therefore unsuitable for
;; -Os (at the very least).
;;
;; (2) By modifying the scheduler hooks so that, where possible:
;;
;; (a) dependent instructions are separated by a non-dependent
;; instruction;
;;
;; (b) instructions that use the multiplication unit are separated
;; by non-multiplication instructions; and
;;
;; (c) memory access instructions are separated by non-memory
;; instructions.
;;
;; The idea is to keep conflicting instructions apart wherever possible
;; and thus make the schedule less dependent on alignment.
(define_automaton "vr4130_main, vr4130_muldiv, vr4130_mulpre")
(define_cpu_unit "vr4130_alu1, vr4130_alu2, vr4130_dcache" "vr4130_main")
(define_cpu_unit "vr4130_muldiv" "vr4130_muldiv")
;; This is a fake unit for pre-reload scheduling of multiplications.
;; It enforces the true post-reload repeat rate.
(define_cpu_unit "vr4130_mulpre" "vr4130_mulpre")
;; The scheduling hooks use this attribute for (b) above.
(define_attr "vr4130_class" "mul,mem,alu"
(cond [(eq_attr "type" "load,store")
(const_string "mem")
(eq_attr "type" "mfhilo,mthilo,imul,imadd,idiv")
(const_string "mul")]
(const_string "alu")))
(define_insn_reservation "vr4130_multi" 1
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "multi,unknown"))
"vr4130_alu1 + vr4130_alu2 + vr4130_dcache + vr4130_muldiv")
(define_insn_reservation "vr4130_int" 1
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "const,arith,shift,slt,nop"))
"vr4130_alu1 | vr4130_alu2")
(define_insn_reservation "vr4130_load" 3
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "load"))
"vr4130_dcache")
(define_insn_reservation "vr4130_store" 1
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "store"))
"vr4130_dcache")
(define_insn_reservation "vr4130_mfhilo" 3
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "mfhilo"))
"vr4130_muldiv")
(define_insn_reservation "vr4130_mthilo" 1
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "mthilo"))
"vr4130_muldiv")
;; The product is available in LO & HI after one cycle. Moving the result
;; into an integer register will take an additional three cycles, see mflo
;; & mfhi above. Note that the same latencies and repeat rates apply if we
;; use "mtlo; macc" instead of "mult; mflo".
(define_insn_reservation "vr4130_mulsi" 4
(and (eq_attr "cpu" "r4130")
(and (eq_attr "type" "imul")
(eq_attr "mode" "SI")))
"vr4130_muldiv + (vr4130_mulpre * 2)")
;; As for vr4130_mulsi, but the product is available in LO and HI
;; after 3 cycles.
(define_insn_reservation "vr4130_muldi" 6
(and (eq_attr "cpu" "r4130")
(and (eq_attr "type" "imul")
(eq_attr "mode" "DI")))
"(vr4130_muldiv * 3) + (vr4130_mulpre * 4)")
;; maccs can execute in consecutive cycles without stalling, but it
;; is 3 cycles before the integer destination can be read.
(define_insn_reservation "vr4130_macc" 3
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "imadd"))
"vr4130_muldiv")
(define_bypass 1 "vr4130_mulsi,vr4130_macc" "vr4130_macc" "mips_linked_madd_p")
(define_bypass 1 "vr4130_mulsi,vr4130_macc" "vr4130_mfhilo")
(define_bypass 3 "vr4130_muldi" "vr4130_mfhilo")
(define_insn_reservation "vr4130_divsi" 36
(and (eq_attr "cpu" "r4130")
(and (eq_attr "type" "idiv")
(eq_attr "mode" "SI")))
"vr4130_muldiv * 36")
(define_insn_reservation "vr4130_divdi" 72
(and (eq_attr "cpu" "r4130")
(and (eq_attr "type" "idiv")
(eq_attr "mode" "DI")))
"vr4130_muldiv * 72")
(define_insn_reservation "vr4130_branch" 0
(and (eq_attr "cpu" "r4130")
(eq_attr "type" "branch,jump,call"))
"vr4130_alu1 | vr4130_alu2")
......@@ -171,7 +171,7 @@ extern const struct mips_cpu_info *mips_tune_info;
#define MASK_FIX_R4400 0x01000000 /* Work around R4400 errata. */
#define MASK_FIX_SB1 0x02000000 /* Work around SB-1 errata. */
#define MASK_FIX_VR4120 0x04000000 /* Work around VR4120 errata. */
#define MASK_VR4130_ALIGN 0x08000000 /* Perform VR4130 alignment opts. */
#define MASK_FP_EXCEPTIONS 0x10000000 /* FP exceptions are enabled. */
/* Debug switches, not documented */
......@@ -253,6 +253,7 @@ extern const struct mips_cpu_info *mips_tune_info;
/* Work around R4400 errata. */
#define TARGET_FIX_R4400 (target_flags & MASK_FIX_R4400)
#define TARGET_FIX_VR4120 (target_flags & MASK_FIX_VR4120)
#define TARGET_VR4130_ALIGN (target_flags & MASK_VR4130_ALIGN)
#define TARGET_FP_EXCEPTIONS (target_flags & MASK_FP_EXCEPTIONS)
......@@ -332,6 +333,8 @@ extern const struct mips_cpu_info *mips_tune_info;
#define TUNE_MIPS3000 (mips_tune == PROCESSOR_R3000)
#define TUNE_MIPS3900 (mips_tune == PROCESSOR_R3900)
#define TUNE_MIPS4000 (mips_tune == PROCESSOR_R4000)
#define TUNE_MIPS4120 (mips_tune == PROCESSOR_R4120)
#define TUNE_MIPS4130 (mips_tune == PROCESSOR_R4130)
#define TUNE_MIPS5000 (mips_tune == PROCESSOR_R5000)
#define TUNE_MIPS5400 (mips_tune == PROCESSOR_R5400)
#define TUNE_MIPS5500 (mips_tune == PROCESSOR_R5500)
......@@ -371,7 +374,9 @@ extern const struct mips_cpu_info *mips_tune_info;
Multiply-accumulate instructions are a bigger win for some targets
than others, so this macro is defined on an opt-in basis. */
#define TUNE_MACC_CHAINS TUNE_MIPS5500
#define TUNE_MACC_CHAINS (TUNE_MIPS5500 \
|| TUNE_MIPS4120 \
|| TUNE_MIPS4130)
#define TARGET_OLDABI (mips_abi == ABI_32 || mips_abi == ABI_O64)
#define TARGET_NEWABI (mips_abi == ABI_N32 || mips_abi == ABI_64)
......@@ -619,6 +624,10 @@ extern const struct mips_cpu_info *mips_tune_info;
N_("Don't generate fused multiply/add instructions")}, \
{"fused-madd", -MASK_NO_FUSED_MADD, \
N_("Generate fused multiply/add instructions")}, \
{"vr4130-align", MASK_VR4130_ALIGN, \
N_("Perform VR4130-specific alignment optimizations")}, \
{"no-vr4130-align", -MASK_VR4130_ALIGN, \
N_("Don't perform VR4130-specific alignment optimizations")}, \
{"fix4300", MASK_4300_MUL_FIX, \
N_("Work around early 4300 hardware bug")}, \
{"no-fix4300", -MASK_4300_MUL_FIX, \
......
......@@ -631,6 +631,7 @@
;; Include scheduling descriptions.
(include "4130.md")
(include "5400.md")
(include "5500.md")
(include "7000.md")
......@@ -1584,6 +1585,37 @@
(set_attr "mode" "SI")
(set_attr "length" "8")])
;; On the VR4120 and VR4130, it is better to use "mtlo $0; macc" instead
;; of "mult; mflo". They have the same latency, but the first form gives
;; us an extra cycle to compute the operands.
;; Operand 0: LO
;; Operand 1: GPR (1st multiplication operand)
;; Operand 2: GPR (2nd multiplication operand)
;; Operand 3: HI
;; Operand 4: GPR (destination)
(define_peephole2
[(parallel
[(set (match_operand:SI 0 "register_operand" "")
(mult:SI (match_operand:SI 1 "register_operand" "")
(match_operand:SI 2 "register_operand" "")))
(clobber (match_operand:SI 3 "register_operand" ""))])
(set (match_operand:SI 4 "register_operand" "")
(unspec:SI [(match_dup 0) (match_dup 3)] UNSPEC_MFHILO))]
"ISA_HAS_MACC && !GENERATE_MULT3_SI"
[(set (match_dup 0)
(const_int 0))
(parallel
[(set (match_dup 0)
(plus:SI (mult:SI (match_dup 1)
(match_dup 2))
(match_dup 0)))
(set (match_dup 4)
(plus:SI (mult:SI (match_dup 1)
(match_dup 2))
(match_dup 0)))
(clobber (match_dup 3))])])
;; Multiply-accumulate patterns
;; For processors that can copy the output to a general register:
......@@ -1673,7 +1705,10 @@
else if (TARGET_MIPS5500)
return "madd\t%1,%2";
else
return "macc\t%.,%1,%2";
/* The VR4130 assumes that there is a two-cycle latency between a macc
that "writes" to $0 and an instruction that reads from it. We avoid
this by assigning to $1 instead. */
return "%[macc\t%@,%1,%2%]";
}
[(set_attr "type" "imadd")
(set_attr "mode" "SI")])
......@@ -1697,6 +1732,31 @@
[(set_attr "type" "imadd")
(set_attr "mode" "SI")])
;; An msac-like instruction implemented using negation and a macc.
(define_insn_and_split "*msac_using_macc"
[(set (match_operand:SI 0 "register_operand" "=l,d")
(minus:SI (match_operand:SI 1 "register_operand" "0,l")
(mult:SI (match_operand:SI 2 "register_operand" "d,d")
(match_operand:SI 3 "register_operand" "d,d"))))
(clobber (match_scratch:SI 4 "=h,h"))
(clobber (match_scratch:SI 5 "=X,1"))
(clobber (match_scratch:SI 6 "=d,d"))]
"ISA_HAS_MACC && !ISA_HAS_MSAC"
"#"
"&& reload_completed"
[(set (match_dup 6)
(neg:SI (match_dup 3)))
(parallel
[(set (match_dup 0)
(plus:SI (mult:SI (match_dup 2)
(match_dup 6))
(match_dup 1)))
(clobber (match_dup 4))
(clobber (match_dup 5))])]
""
[(set_attr "type" "imadd")
(set_attr "length" "8")])
;; Patterns generated by the define_peephole2 below.
(define_insn "*macc2"
......@@ -2367,7 +2427,8 @@
else if (TARGET_MIPS5500)
return "maddu\t%1,%2";
else
return "maccu\t%.,%1,%2";
/* See comment in *macc. */
return "%[maccu\t%@,%1,%2%]";
}
[(set_attr "type" "imadd")
(set_attr "mode" "SI")])
......@@ -2387,7 +2448,8 @@
else if (TARGET_MIPS5500)
return "madd\t%1,%2";
else
return "macc\t%.,%1,%2";
/* See comment in *macc. */
return "%[macc\t%@,%1,%2%]";
}
[(set_attr "type" "imadd")
(set_attr "mode" "SI")])
......
......@@ -483,7 +483,8 @@ in the following sections.
-mfix-vr4120 -mno-fix-vr4120 -mfix-sb1 -mno-fix-sb1 @gol
-mflush-func=@var{func} -mno-flush-func @gol
-mbranch-likely -mno-branch-likely @gol
-mfp-exceptions -mno-fp-exceptions}
-mfp-exceptions -mno-fp-exceptions @gol
-mvr4130-align -mno-vr4130-align}
@emph{i386 and x86-64 Options}
@gccoptlist{-mtune=@var{cpu-type} -march=@var{cpu-type} @gol
......@@ -8245,6 +8246,18 @@ enabled.
For instance, on the SB-1, if FP exceptions are disabled, and we are emitting
64-bit code, then we can use both FP pipes. Otherwise, we can only use one
FP pipe.
@item -mvr4130-align
@itemx -mno-vr4130-align
@opindex mvr4130-align
The VR4130 pipeline is two-way superscalar, but can only issue two
instructions together if the first one is 8-byte aligned. When this
option is enabled, GCC will align pairs of instructions that it
thinks should execute in parallel.
This option only has an effect when optimizing for the VR4130.
It normally makes code faster, but at the expense of making it bigger.
It is enabled by default at optimization level @option{-O3}.
@end table
@node i386 and x86-64 Options
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment