invoke.texi: Document -mvr4130-align.

* doc/invoke.texi: Document -mvr4130-align. * config/mips/mips.h (MASK_VR4130_ALIGN, TARGET_VR4130_ALIGN) (TUNE_MIPS4120, TUNE_MIPS4130): New macros. (TUNE_MACC_CHAINS): Include TUNE_MIPS4120 and TUNE_MIPS4130. (TARGET_SWITCHES): Add -mvr4130-align and -mno-vr4130-align. * config/mips/mips.md: Include sched-int.h. (USEFUL_INSN_P, SEQ_BEGIN, SEQ_END, FOR_EACH_SUBINSN): New macros. (mips_rtx_costs): Set integer multiplication costs for TUNE_MIPS4130. (override_options): Enable -mvr4130-align at -O3 and above. (mips_sim_insn): New variable. (mips_sim): New structure. (mips_sim_reset, mips_sim_init, mips_sim_next_cycle, mips_sim_wait_reg) (mips_sim_wait_regs_2, mips_sim_wait_regs_1, mips_sim_wait_regs) (mips_sim_wait_units, mips_sim_wait_insn, mips_sim_record_set) (mips_sim_issue_insn, mips_sim_issue_nop, mips_sim_finish_insn) (vr4130_avoid_branch_rt_conflict, vr4130_align_insns): New functions. (mips_reorg): Call vr4130_align_insns. (vr4130_last_insn): New variable. (vr4130_true_reg_dependence_p_1, vr4130_true_reg_dependence_p) (vr4130_swap_insns_p, vr4130_reorder): New functions. (mips_sched_reorder, mips_variable_issue): Hook in vr4130 code. (mips_issue_rate): Return 2 for PROCESSOR_R4130. (mips_use_dfa_pipeline_interface): Return true for the same. * config/mips/4130.md: New file. * config/mips/mips.md: Include it. Add a peephole2 to convert "mult;mflo" into "mtlo;macc". (*macc, *umul_acc_di, *smul_acc_di): Use $1 rather than $0 as the target of maccs. (*msac_using_macc): New pattern. From-SVN: r81567

invoke.texi: Document -mvr4130-align.
* doc/invoke.texi: Document -mvr4130-align. * config/mips/mips.h (MASK_VR4130_ALIGN, TARGET_VR4130_ALIGN) (TUNE_MIPS4120, TUNE_MIPS4130): New macros. (TUNE_MACC_CHAINS): Include TUNE_MIPS4120 and TUNE_MIPS4130. (TARGET_SWITCHES): Add -mvr4130-align and -mno-vr4130-align. * config/mips/mips.md: Include sched-int.h. (USEFUL_INSN_P, SEQ_BEGIN, SEQ_END, FOR_EACH_SUBINSN): New macros. (mips_rtx_costs): Set integer multiplication costs for TUNE_MIPS4130. (override_options): Enable -mvr4130-align at -O3 and above. (mips_sim_insn): New variable. (mips_sim): New structure. (mips_sim_reset, mips_sim_init, mips_sim_next_cycle, mips_sim_wait_reg) (mips_sim_wait_regs_2, mips_sim_wait_regs_1, mips_sim_wait_regs) (mips_sim_wait_units, mips_sim_wait_insn, mips_sim_record_set) (mips_sim_issue_insn, mips_sim_issue_nop, mips_sim_finish_insn) (vr4130_avoid_branch_rt_conflict, vr4130_align_insns): New functions. (mips_reorg): Call vr4130_align_insns. (vr4130_last_insn): New variable. (vr4130_true_reg_dependence_p_1, vr4130_true_reg_dependence_p) (vr4130_swap_insns_p, vr4130_reorder): New functions. (mips_sched_reorder, mips_variable_issue): Hook in vr4130 code. (mips_issue_rate): Return 2 for PROCESSOR_R4130. (mips_use_dfa_pipeline_interface): Return true for the same. * config/mips/4130.md: New file. * config/mips/mips.md: Include it. Add a peephole2 to convert "mult;mflo" into "mtlo;macc". (*macc, *umul_acc_di, *smul_acc_di): Use $1 rather than $0 as the target of maccs. (*msac_using_macc): New pattern. From-SVN: r81567
dc884a86 · Richard Sandiford · Richard Sandiford · e51f7aeb · dc884a86 · dc884a86
Commit dc884a86 authored May 06, 2004 by Richard Sandiford Committed by Richard Sandiford May 06, 2004
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 258 additions and 6 deletions

gcc/ChangeLog
+32 -0

gcc/config/mips/4130.md
+136 -0

gcc/config/mips/mips.c
+0 -0

gcc/config/mips/mips.h
+11 -2

gcc/config/mips/mips.md
+65 -3

gcc/doc/invoke.texi
+14 -1

No files found.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
 2004-05-06  Richard Sandiford  <rsandifo@redhat.com>

+	* doc/invoke.texi: Document -mvr4130-align.
+	* config/mips/mips.h (MASK_VR4130_ALIGN, TARGET_VR4130_ALIGN)
+	(TUNE_MIPS4120, TUNE_MIPS4130): New macros.
+	(TUNE_MACC_CHAINS): Include TUNE_MIPS4120 and TUNE_MIPS4130.
+	(TARGET_SWITCHES): Add -mvr4130-align and -mno-vr4130-align.
+	* config/mips/mips.md: Include sched-int.h.
+	(USEFUL_INSN_P, SEQ_BEGIN, SEQ_END, FOR_EACH_SUBINSN): New macros.
+	(mips_rtx_costs): Set integer multiplication costs for TUNE_MIPS4130.
+	(override_options): Enable -mvr4130-align at -O3 and above.
+	(mips_sim_insn): New variable.
+	(mips_sim): New structure.
+	(mips_sim_reset, mips_sim_init, mips_sim_next_cycle, mips_sim_wait_reg)
+	(mips_sim_wait_regs_2, mips_sim_wait_regs_1, mips_sim_wait_regs)
+	(mips_sim_wait_units, mips_sim_wait_insn, mips_sim_record_set)
+	(mips_sim_issue_insn, mips_sim_issue_nop, mips_sim_finish_insn)
+	(vr4130_avoid_branch_rt_conflict, vr4130_align_insns): New functions.
+	(mips_reorg): Call vr4130_align_insns.
+	(vr4130_last_insn): New variable.
+	(vr4130_true_reg_dependence_p_1, vr4130_true_reg_dependence_p)
+	(vr4130_swap_insns_p, vr4130_reorder): New functions.
+	(mips_sched_reorder, mips_variable_issue): Hook in vr4130 code.
+	(mips_issue_rate): Return 2 for PROCESSOR_R4130.
+	(mips_use_dfa_pipeline_interface): Return true for the same.
+	* config/mips/4130.md: New file.
+	* config/mips/mips.md: Include it.  Add a peephole2 to convert
+	"mult;mflo" into "mtlo;macc".
+	(*macc, *umul_acc_di, *smul_acc_di): Use $1 rather than $0 as the
+	target of maccs.
+	(*msac_using_macc): New pattern.
+
+2004-05-06  Richard Sandiford  <rsandifo@redhat.com>
+
 	* config/mips/5500.md (ir_vr55_store): Set latency to 0.
 	(ir_vr55_hilo): Split into...
 	(ir_vr55_mfhilo, ir_vr55_mthilo): ...these new reservations.

--- a/gcc/config/mips/4130.md
+++ b/gcc/config/mips/4130.md
+;;
+;; Pipeline description for the VR4130 family.
+;;
+;; The processor issues each 8-byte aligned pair of instructions together,
+;; stalling the second instruction if it depends on the first.  Thus, if we
+;; want two instructions to issue in parallel, we need to make sure that the
+;; first one is 8-byte aligned.
+;;
+;; For the purposes of this pipeline description, we treat the processor
+;; like a standard two-way superscalar architecture.  If scheduling were
+;; the last pass to run, we could use the scheduler hooks to vary the
+;; issue rate depending on whether an instruction is at an aligned or
+;; unaligned address.  Unfortunately, delayed branch scheduling and
+;; hazard avoidance are done after the final scheduling pass, and they
+;; can change the addresses of many instructions.
+;;
+;; We get around this in two ways:
+;;
+;;   (1) By running an extra pass at the end of compilation.  This pass goes
+;;	 through the function looking for pairs of instructions that could
+;;	 execute in parallel.  It makes sure that the first instruction in
+;;	 each pair is suitably aligned, inserting nops if necessary.  Doing
+;;	 this gives the same kind of pipeline behavior we would see on a
+;;	 normal superscalar target.
+;;
+;;	 This pass is generally a speed improvement, but the extra nops will
+;;	 obviously make the program bigger.  It is therefore unsuitable for
+;;	 -Os (at the very least).
+;;
+;;   (2) By modifying the scheduler hooks so that, where possible:
+;;
+;;	 (a) dependent instructions are separated by a non-dependent
+;;	     instruction;
+;;
+;;	 (b) instructions that use the multiplication unit are separated
+;;	     by non-multiplication instructions; and
+;;
+;;	 (c) memory access instructions are separated by non-memory
+;;	     instructions.
+;;
+;;	 The idea is to keep conflicting instructions apart wherever possible
+;;	 and thus make the schedule less dependent on alignment.
+
+(define_automaton "vr4130_main, vr4130_muldiv, vr4130_mulpre")
+
+(define_cpu_unit "vr4130_alu1, vr4130_alu2, vr4130_dcache" "vr4130_main")
+(define_cpu_unit "vr4130_muldiv" "vr4130_muldiv")
+
+;; This is a fake unit for pre-reload scheduling of multiplications.
+;; It enforces the true post-reload repeat rate.
+(define_cpu_unit "vr4130_mulpre" "vr4130_mulpre")
+
+;; The scheduling hooks use this attribute for (b) above.
+(define_attr "vr4130_class" "mul,mem,alu"
+  (cond [(eq_attr "type" "load,store")
+	 (const_string "mem")
+
+	 (eq_attr "type" "mfhilo,mthilo,imul,imadd,idiv")
+	 (const_string "mul")]
+	(const_string "alu")))
+
+(define_insn_reservation "vr4130_multi" 1
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "multi,unknown"))
+  "vr4130_alu1 + vr4130_alu2 + vr4130_dcache + vr4130_muldiv")
+
+(define_insn_reservation "vr4130_int" 1
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "const,arith,shift,slt,nop"))
+  "vr4130_alu1 | vr4130_alu2")
+
+(define_insn_reservation "vr4130_load" 3
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "load"))
+  "vr4130_dcache")
+
+(define_insn_reservation "vr4130_store" 1
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "store"))
+  "vr4130_dcache")
+
+(define_insn_reservation "vr4130_mfhilo" 3
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "mfhilo"))
+  "vr4130_muldiv")
+
+(define_insn_reservation "vr4130_mthilo" 1
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "mthilo"))
+  "vr4130_muldiv")
+
+;; The product is available in LO & HI after one cycle.  Moving the result
+;; into an integer register will take an additional three cycles, see mflo
+;; & mfhi above.  Note that the same latencies and repeat rates apply if we
+;; use "mtlo; macc" instead of "mult; mflo".
+(define_insn_reservation "vr4130_mulsi" 4
+  (and (eq_attr "cpu" "r4130")
+       (and (eq_attr "type" "imul")
+	    (eq_attr "mode" "SI")))
+  "vr4130_muldiv + (vr4130_mulpre * 2)")
+
+;; As for vr4130_mulsi, but the product is available in LO and HI
+;; after 3 cycles.
+(define_insn_reservation "vr4130_muldi" 6
+  (and (eq_attr "cpu" "r4130")
+       (and (eq_attr "type" "imul")
+	    (eq_attr "mode" "DI")))
+  "(vr4130_muldiv * 3) + (vr4130_mulpre * 4)")
+
+;; maccs can execute in consecutive cycles without stalling, but it
+;; is 3 cycles before the integer destination can be read.
+(define_insn_reservation "vr4130_macc" 3
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "imadd"))
+  "vr4130_muldiv")
+
+(define_bypass 1 "vr4130_mulsi,vr4130_macc" "vr4130_macc" "mips_linked_madd_p")
+(define_bypass 1 "vr4130_mulsi,vr4130_macc" "vr4130_mfhilo")
+(define_bypass 3 "vr4130_muldi" "vr4130_mfhilo")
+
+(define_insn_reservation "vr4130_divsi" 36
+  (and (eq_attr "cpu" "r4130")
+       (and (eq_attr "type" "idiv")
+	    (eq_attr "mode" "SI")))
+  "vr4130_muldiv * 36")
+
+(define_insn_reservation "vr4130_divdi" 72
+  (and (eq_attr "cpu" "r4130")
+       (and (eq_attr "type" "idiv")
+	    (eq_attr "mode" "DI")))
+  "vr4130_muldiv * 72")
+
+(define_insn_reservation "vr4130_branch" 0
+  (and (eq_attr "cpu" "r4130")
+       (eq_attr "type" "branch,jump,call"))
+  "vr4130_alu1 | vr4130_alu2")
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -171,7 +171,7 @@ extern const struct mips_cpu_info *mips_tune_info;
 #define MASK_FIX_R4400	   0x01000000	/* Work around R4400 errata.  */
 #define MASK_FIX_SB1	   0x02000000	/* Work around SB-1 errata.  */
 #define MASK_FIX_VR4120	   0x04000000   /* Work around VR4120 errata.  */
-
+#define MASK_VR4130_ALIGN  0x08000000	/* Perform VR4130 alignment opts.  */
 #define MASK_FP_EXCEPTIONS 0x10000000   /* FP exceptions are enabled.  */

 					/* Debug switches, not documented */
@@ -253,6 +253,7 @@ extern const struct mips_cpu_info *mips_tune_info;
 					/* Work around R4400 errata.  */
 #define TARGET_FIX_R4400	(target_flags & MASK_FIX_R4400)
 #define TARGET_FIX_VR4120	(target_flags & MASK_FIX_VR4120)
+#define TARGET_VR4130_ALIGN	(target_flags & MASK_VR4130_ALIGN)

 #define TARGET_FP_EXCEPTIONS	(target_flags & MASK_FP_EXCEPTIONS)

@@ -332,6 +333,8 @@ extern const struct mips_cpu_info *mips_tune_info;
 #define TUNE_MIPS3000               (mips_tune == PROCESSOR_R3000)
 #define TUNE_MIPS3900               (mips_tune == PROCESSOR_R3900)
 #define TUNE_MIPS4000               (mips_tune == PROCESSOR_R4000)
+#define TUNE_MIPS4120               (mips_tune == PROCESSOR_R4120)
+#define TUNE_MIPS4130               (mips_tune == PROCESSOR_R4130)
 #define TUNE_MIPS5000               (mips_tune == PROCESSOR_R5000)
 #define TUNE_MIPS5400               (mips_tune == PROCESSOR_R5400)
 #define TUNE_MIPS5500               (mips_tune == PROCESSOR_R5500)
@@ -371,7 +374,9 @@ extern const struct mips_cpu_info *mips_tune_info;

   Multiply-accumulate instructions are a bigger win for some targets
   than others, so this macro is defined on an opt-in basis.  */
-#define TUNE_MACC_CHAINS	    TUNE_MIPS5500
+#define TUNE_MACC_CHAINS	    (TUNE_MIPS5500		\
+				     || TUNE_MIPS4120		\
+				     || TUNE_MIPS4130)

 #define TARGET_OLDABI		    (mips_abi == ABI_32 || mips_abi == ABI_O64)
 #define TARGET_NEWABI		    (mips_abi == ABI_N32 || mips_abi == ABI_64)
@@ -619,6 +624,10 @@ extern const struct mips_cpu_info *mips_tune_info;
     N_("Don't generate fused multiply/add instructions")},		\
  {"fused-madd",         -MASK_NO_FUSED_MADD,                           \
     N_("Generate fused multiply/add instructions")},			\
+  {"vr4130-align",	  MASK_VR4130_ALIGN,				\
+     N_("Perform VR4130-specific alignment optimizations")},		\
+  {"no-vr4130-align",	 -MASK_VR4130_ALIGN,				\
+     N_("Don't perform VR4130-specific alignment optimizations")},	\
  {"fix4300",             MASK_4300_MUL_FIX,				\
     N_("Work around early 4300 hardware bug")},			\
  {"no-fix4300",         -MASK_4300_MUL_FIX,				\

--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -631,6 +631,7 @@

 ;; Include scheduling descriptions.

+(include "4130.md")
 (include "5400.md")
 (include "5500.md")
 (include "7000.md")
@@ -1584,6 +1585,37 @@
   (set_attr "mode"	"SI")
   (set_attr "length"   "8")])

+;; On the VR4120 and VR4130, it is better to use "mtlo $0; macc" instead
+;; of "mult; mflo".  They have the same latency, but the first form gives
+;; us an extra cycle to compute the operands.
+
+;; Operand 0: LO
+;; Operand 1: GPR (1st multiplication operand)
+;; Operand 2: GPR (2nd multiplication operand)
+;; Operand 3: HI
+;; Operand 4: GPR (destination)
+(define_peephole2
+  [(parallel
+       [(set (match_operand:SI 0 "register_operand" "")
+	     (mult:SI (match_operand:SI 1 "register_operand" "")
+		      (match_operand:SI 2 "register_operand" "")))
+        (clobber (match_operand:SI 3 "register_operand" ""))])
+   (set (match_operand:SI 4 "register_operand" "")
+	(unspec:SI [(match_dup 0) (match_dup 3)] UNSPEC_MFHILO))]
+  "ISA_HAS_MACC && !GENERATE_MULT3_SI"
+  [(set (match_dup 0)
+	(const_int 0))
+   (parallel
+       [(set (match_dup 0)
+	     (plus:SI (mult:SI (match_dup 1)
+			       (match_dup 2))
+		      (match_dup 0)))
+	(set (match_dup 4)
+	     (plus:SI (mult:SI (match_dup 1)
+			       (match_dup 2))
+		      (match_dup 0)))
+        (clobber (match_dup 3))])])
+
 ;; Multiply-accumulate patterns

 ;; For processors that can copy the output to a general register:
@@ -1673,7 +1705,10 @@
  else if (TARGET_MIPS5500)
    return "madd\t%1,%2";
  else
-    return "macc\t%.,%1,%2";
+    /* The VR4130 assumes that there is a two-cycle latency between a macc
+       that "writes" to $0 and an instruction that reads from it.  We avoid
+       this by assigning to $1 instead.  */
+    return "%[macc\t%@,%1,%2%]";
 }
  [(set_attr "type" "imadd")
   (set_attr "mode" "SI")])
@@ -1697,6 +1732,31 @@
  [(set_attr "type"     "imadd")
   (set_attr "mode"     "SI")])

+;; An msac-like instruction implemented using negation and a macc.
+(define_insn_and_split "*msac_using_macc"
+  [(set (match_operand:SI 0 "register_operand" "=l,d")
+        (minus:SI (match_operand:SI 1 "register_operand" "0,l")
+                  (mult:SI (match_operand:SI 2 "register_operand" "d,d")
+                           (match_operand:SI 3 "register_operand" "d,d"))))
+   (clobber (match_scratch:SI 4 "=h,h"))
+   (clobber (match_scratch:SI 5 "=X,1"))
+   (clobber (match_scratch:SI 6 "=d,d"))]
+  "ISA_HAS_MACC && !ISA_HAS_MSAC"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 6)
+	(neg:SI (match_dup 3)))
+   (parallel
+       [(set (match_dup 0)
+	     (plus:SI (mult:SI (match_dup 2)
+			       (match_dup 6))
+		      (match_dup 1)))
+	(clobber (match_dup 4))
+	(clobber (match_dup 5))])]
+  ""
+  [(set_attr "type"     "imadd")
+   (set_attr "length"	"8")])
+
 ;; Patterns generated by the define_peephole2 below.

 (define_insn "*macc2"
@@ -2367,7 +2427,8 @@
  else if (TARGET_MIPS5500)
    return "maddu\t%1,%2";
  else
-    return "maccu\t%.,%1,%2";
+    /* See comment in *macc.  */
+    return "%[maccu\t%@,%1,%2%]";
 }
  [(set_attr "type"   "imadd")
   (set_attr "mode"   "SI")])
@@ -2387,7 +2448,8 @@
  else if (TARGET_MIPS5500)
    return "madd\t%1,%2";
  else
-    return "macc\t%.,%1,%2";
+    /* See comment in *macc.  */
+    return "%[macc\t%@,%1,%2%]";
 }
  [(set_attr "type"   "imadd")
   (set_attr "mode"   "SI")])

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -483,7 +483,8 @@ in the following sections.
 -mfix-vr4120  -mno-fix-vr4120  -mfix-sb1  -mno-fix-sb1 @gol
 -mflush-func=@var{func}  -mno-flush-func @gol
 -mbranch-likely  -mno-branch-likely @gol
-mfp-exceptions -mno-fp-exceptions}
+-mfp-exceptions -mno-fp-exceptions @gol
+-mvr4130-align -mno-vr4130-align}

 @emph{i386 and x86-64 Options}
 @gccoptlist{-mtune=@var{cpu-type}  -march=@var{cpu-type} @gol
@@ -8245,6 +8246,18 @@ enabled.
 For instance, on the SB-1, if FP exceptions are disabled, and we are emitting
 64-bit code, then we can use both FP pipes.  Otherwise, we can only use one
 FP pipe.
+
+@item -mvr4130-align
+@itemx -mno-vr4130-align
+@opindex mvr4130-align
+The VR4130 pipeline is two-way superscalar, but can only issue two
+instructions together if the first one is 8-byte aligned.  When this
+option is enabled, GCC will align pairs of instructions that it
+thinks should execute in parallel.
+
+This option only has an effect when optimizing for the VR4130.
+It normally makes code faster, but at the expense of making it bigger.
+It is enabled by default at optimization level @option{-O3}.
 @end table

 @node i386 and x86-64 Options