Commit 93cf5515 by Richard Biener Committed by Richard Biener

re PR rtl-optimization/91154 (456.hmmer regression on Haswell caused by r272922)

2019-08-14  Richard Biener  <rguenther@suse.de>
        Uroš Bizjak  <ubizjak@gmail.com>

	PR target/91154
	* config/i386/i386-features.h (scalar_chain::scalar_chain): Add
	mode arguments.
	(scalar_chain::smode): New member.
	(scalar_chain::vmode): Likewise.
	(dimode_scalar_chain): Rename to...
	(general_scalar_chain): ... this.
	(general_scalar_chain::general_scalar_chain): Take mode arguments.
	(timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
	base with TImode and V1TImode.
	* config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust.
	(general_scalar_chain::vector_const_cost): Adjust for SImode
	chains.
	(general_scalar_chain::compute_convert_gain): Likewise.  Add
	{S,U}{MIN,MAX} support.
	(general_scalar_chain::replace_with_subreg): Use vmode/smode.
	(general_scalar_chain::make_vector_copies): Likewise.  Handle
	non-DImode chains appropriately.
	(general_scalar_chain::convert_reg): Likewise.
	(general_scalar_chain::convert_op): Likewise.
	(general_scalar_chain::convert_insn): Likewise.  Add
	fatal_insn_not_found if the result is not recognized.
	(convertible_comparison_p): Pass in the scalar mode and use that.
	(general_scalar_to_vector_candidate_p): Likewise.  Rename from
	dimode_scalar_to_vector_candidate_p.  Add {S,U}{MIN,MAX} support.
	(scalar_to_vector_candidate_p): Remove by inlining into single
	caller.
	(general_remove_non_convertible_regs): Rename from
	dimode_remove_non_convertible_regs.
	(remove_non_convertible_regs): Remove by inlining into single caller.
	(convert_scalars_to_vector): Handle SImode and DImode chains
	in addition to TImode chains.
	* config/i386/i386.md (<maxmin><MAXMIN_IMODE>3): New expander.
	(*<maxmin><MAXMIN_IMODE>3_1): New insn-and-split.
	(*<maxmin>di3_doubleword): Likewise.

	* gcc.target/i386/pr91154.c: New testcase.
	* gcc.target/i386/minmax-3.c: Likewise.
	* gcc.target/i386/minmax-4.c: Likewise.
	* gcc.target/i386/minmax-5.c: Likewise.
	* gcc.target/i386/minmax-6.c: Likewise.
	* gcc.target/i386/minmax-1.c: Add -mno-stv.
	* gcc.target/i386/minmax-2.c: Likewise.

Co-Authored-By: Uros Bizjak <ubizjak@gmail.com>

From-SVN: r274481
parent 1b187f36
2019-08-14 Richard Biener <rguenther@suse.de>
Uroš Bizjak <ubizjak@gmail.com>
PR target/91154
* config/i386/i386-features.h (scalar_chain::scalar_chain): Add
mode arguments.
(scalar_chain::smode): New member.
(scalar_chain::vmode): Likewise.
(dimode_scalar_chain): Rename to...
(general_scalar_chain): ... this.
(general_scalar_chain::general_scalar_chain): Take mode arguments.
(timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain
base with TImode and V1TImode.
* config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust.
(general_scalar_chain::vector_const_cost): Adjust for SImode
chains.
(general_scalar_chain::compute_convert_gain): Likewise. Add
{S,U}{MIN,MAX} support.
(general_scalar_chain::replace_with_subreg): Use vmode/smode.
(general_scalar_chain::make_vector_copies): Likewise. Handle
non-DImode chains appropriately.
(general_scalar_chain::convert_reg): Likewise.
(general_scalar_chain::convert_op): Likewise.
(general_scalar_chain::convert_insn): Likewise. Add
fatal_insn_not_found if the result is not recognized.
(convertible_comparison_p): Pass in the scalar mode and use that.
(general_scalar_to_vector_candidate_p): Likewise. Rename from
dimode_scalar_to_vector_candidate_p. Add {S,U}{MIN,MAX} support.
(scalar_to_vector_candidate_p): Remove by inlining into single
caller.
(general_remove_non_convertible_regs): Rename from
dimode_remove_non_convertible_regs.
(remove_non_convertible_regs): Remove by inlining into single caller.
(convert_scalars_to_vector): Handle SImode and DImode chains
in addition to TImode chains.
* config/i386/i386.md (<maxmin><MAXMIN_IMODE>3): New expander.
(*<maxmin><MAXMIN_IMODE>3_1): New insn-and-split.
(*<maxmin>di3_doubleword): Likewise.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com> 2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
......
...@@ -276,8 +276,11 @@ unsigned scalar_chain::max_id = 0; ...@@ -276,8 +276,11 @@ unsigned scalar_chain::max_id = 0;
/* Initialize new chain. */ /* Initialize new chain. */
scalar_chain::scalar_chain () scalar_chain::scalar_chain (enum machine_mode smode_, enum machine_mode vmode_)
{ {
smode = smode_;
vmode = vmode_;
chain_id = ++max_id; chain_id = ++max_id;
if (dump_file) if (dump_file)
...@@ -319,7 +322,7 @@ scalar_chain::add_to_queue (unsigned insn_uid) ...@@ -319,7 +322,7 @@ scalar_chain::add_to_queue (unsigned insn_uid)
conversion. */ conversion. */
void void
dimode_scalar_chain::mark_dual_mode_def (df_ref def) general_scalar_chain::mark_dual_mode_def (df_ref def)
{ {
gcc_assert (DF_REF_REG_DEF_P (def)); gcc_assert (DF_REF_REG_DEF_P (def));
...@@ -409,6 +412,9 @@ scalar_chain::add_insn (bitmap candidates, unsigned int insn_uid) ...@@ -409,6 +412,9 @@ scalar_chain::add_insn (bitmap candidates, unsigned int insn_uid)
&& !HARD_REGISTER_P (SET_DEST (def_set))) && !HARD_REGISTER_P (SET_DEST (def_set)))
bitmap_set_bit (defs, REGNO (SET_DEST (def_set))); bitmap_set_bit (defs, REGNO (SET_DEST (def_set)));
/* ??? The following is quadratic since analyze_register_chain
iterates over all refs to look for dual-mode regs. Instead this
should be done separately for all regs mentioned in the chain once. */
df_ref ref; df_ref ref;
df_ref def; df_ref def;
for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
...@@ -469,19 +475,21 @@ scalar_chain::build (bitmap candidates, unsigned insn_uid) ...@@ -469,19 +475,21 @@ scalar_chain::build (bitmap candidates, unsigned insn_uid)
instead of using a scalar one. */ instead of using a scalar one. */
int int
dimode_scalar_chain::vector_const_cost (rtx exp) general_scalar_chain::vector_const_cost (rtx exp)
{ {
gcc_assert (CONST_INT_P (exp)); gcc_assert (CONST_INT_P (exp));
if (standard_sse_constant_p (exp, V2DImode)) if (standard_sse_constant_p (exp, vmode))
return COSTS_N_INSNS (1); return ix86_cost->sse_op;
return ix86_cost->sse_load[1]; /* We have separate costs for SImode and DImode, use SImode costs
for smaller modes. */
return ix86_cost->sse_load[smode == DImode ? 1 : 0];
} }
/* Compute a gain for chain conversion. */ /* Compute a gain for chain conversion. */
int int
dimode_scalar_chain::compute_convert_gain () general_scalar_chain::compute_convert_gain ()
{ {
bitmap_iterator bi; bitmap_iterator bi;
unsigned insn_uid; unsigned insn_uid;
...@@ -491,6 +499,13 @@ dimode_scalar_chain::compute_convert_gain () ...@@ -491,6 +499,13 @@ dimode_scalar_chain::compute_convert_gain ()
if (dump_file) if (dump_file)
fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id); fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id);
/* SSE costs distinguish between SImode and DImode loads/stores, for
int costs factor in the number of GPRs involved. When supporting
smaller modes than SImode the int load/store costs need to be
adjusted as well. */
unsigned sse_cost_idx = smode == DImode ? 1 : 0;
unsigned m = smode == DImode ? (TARGET_64BIT ? 1 : 2) : 1;
EXECUTE_IF_SET_IN_BITMAP (insns, 0, insn_uid, bi) EXECUTE_IF_SET_IN_BITMAP (insns, 0, insn_uid, bi)
{ {
rtx_insn *insn = DF_INSN_UID_GET (insn_uid)->insn; rtx_insn *insn = DF_INSN_UID_GET (insn_uid)->insn;
...@@ -500,18 +515,19 @@ dimode_scalar_chain::compute_convert_gain () ...@@ -500,18 +515,19 @@ dimode_scalar_chain::compute_convert_gain ()
int igain = 0; int igain = 0;
if (REG_P (src) && REG_P (dst)) if (REG_P (src) && REG_P (dst))
igain += 2 - ix86_cost->xmm_move; igain += 2 * m - ix86_cost->xmm_move;
else if (REG_P (src) && MEM_P (dst)) else if (REG_P (src) && MEM_P (dst))
igain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; igain
+= m * ix86_cost->int_store[2] - ix86_cost->sse_store[sse_cost_idx];
else if (MEM_P (src) && REG_P (dst)) else if (MEM_P (src) && REG_P (dst))
igain += 2 * ix86_cost->int_load[2] - ix86_cost->sse_load[1]; igain += m * ix86_cost->int_load[2] - ix86_cost->sse_load[sse_cost_idx];
else if (GET_CODE (src) == ASHIFT else if (GET_CODE (src) == ASHIFT
|| GET_CODE (src) == ASHIFTRT || GET_CODE (src) == ASHIFTRT
|| GET_CODE (src) == LSHIFTRT) || GET_CODE (src) == LSHIFTRT)
{ {
if (CONST_INT_P (XEXP (src, 0))) if (CONST_INT_P (XEXP (src, 0)))
igain -= vector_const_cost (XEXP (src, 0)); igain -= vector_const_cost (XEXP (src, 0));
igain += 2 * ix86_cost->shift_const - ix86_cost->sse_op; igain += m * ix86_cost->shift_const - ix86_cost->sse_op;
if (INTVAL (XEXP (src, 1)) >= 32) if (INTVAL (XEXP (src, 1)) >= 32)
igain -= COSTS_N_INSNS (1); igain -= COSTS_N_INSNS (1);
} }
...@@ -521,11 +537,11 @@ dimode_scalar_chain::compute_convert_gain () ...@@ -521,11 +537,11 @@ dimode_scalar_chain::compute_convert_gain ()
|| GET_CODE (src) == XOR || GET_CODE (src) == XOR
|| GET_CODE (src) == AND) || GET_CODE (src) == AND)
{ {
igain += 2 * ix86_cost->add - ix86_cost->sse_op; igain += m * ix86_cost->add - ix86_cost->sse_op;
/* Additional gain for andnot for targets without BMI. */ /* Additional gain for andnot for targets without BMI. */
if (GET_CODE (XEXP (src, 0)) == NOT if (GET_CODE (XEXP (src, 0)) == NOT
&& !TARGET_BMI) && !TARGET_BMI)
igain += 2 * ix86_cost->add; igain += m * ix86_cost->add;
if (CONST_INT_P (XEXP (src, 0))) if (CONST_INT_P (XEXP (src, 0)))
igain -= vector_const_cost (XEXP (src, 0)); igain -= vector_const_cost (XEXP (src, 0));
...@@ -534,7 +550,18 @@ dimode_scalar_chain::compute_convert_gain () ...@@ -534,7 +550,18 @@ dimode_scalar_chain::compute_convert_gain ()
} }
else if (GET_CODE (src) == NEG else if (GET_CODE (src) == NEG
|| GET_CODE (src) == NOT) || GET_CODE (src) == NOT)
igain += 2 * ix86_cost->add - ix86_cost->sse_op - COSTS_N_INSNS (1); igain += m * ix86_cost->add - ix86_cost->sse_op - COSTS_N_INSNS (1);
else if (GET_CODE (src) == SMAX
|| GET_CODE (src) == SMIN
|| GET_CODE (src) == UMAX
|| GET_CODE (src) == UMIN)
{
/* We do not have any conditional move cost, estimate it as a
reg-reg move. Comparisons are costed as adds. */
igain += m * (COSTS_N_INSNS (2) + ix86_cost->add);
/* Integer SSE ops are all costed the same. */
igain -= ix86_cost->sse_op;
}
else if (GET_CODE (src) == COMPARE) else if (GET_CODE (src) == COMPARE)
{ {
/* Assume comparison cost is the same. */ /* Assume comparison cost is the same. */
...@@ -542,9 +569,11 @@ dimode_scalar_chain::compute_convert_gain () ...@@ -542,9 +569,11 @@ dimode_scalar_chain::compute_convert_gain ()
else if (CONST_INT_P (src)) else if (CONST_INT_P (src))
{ {
if (REG_P (dst)) if (REG_P (dst))
igain += 2 * COSTS_N_INSNS (1); /* DImode can be immediate for TARGET_64BIT and SImode always. */
igain += m * COSTS_N_INSNS (1);
else if (MEM_P (dst)) else if (MEM_P (dst))
igain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; igain += (m * ix86_cost->int_store[2]
- ix86_cost->sse_store[sse_cost_idx]);
igain -= vector_const_cost (src); igain -= vector_const_cost (src);
} }
else else
...@@ -561,6 +590,7 @@ dimode_scalar_chain::compute_convert_gain () ...@@ -561,6 +590,7 @@ dimode_scalar_chain::compute_convert_gain ()
if (dump_file) if (dump_file)
fprintf (dump_file, " Instruction conversion gain: %d\n", gain); fprintf (dump_file, " Instruction conversion gain: %d\n", gain);
/* ??? What about integer to SSE? */
EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi) EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer; cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
...@@ -578,10 +608,10 @@ dimode_scalar_chain::compute_convert_gain () ...@@ -578,10 +608,10 @@ dimode_scalar_chain::compute_convert_gain ()
/* Replace REG in X with a V2DI subreg of NEW_REG. */ /* Replace REG in X with a V2DI subreg of NEW_REG. */
rtx rtx
dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
{ {
if (x == reg) if (x == reg)
return gen_rtx_SUBREG (V2DImode, new_reg, 0); return gen_rtx_SUBREG (vmode, new_reg, 0);
const char *fmt = GET_RTX_FORMAT (GET_CODE (x)); const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
int i, j; int i, j;
...@@ -601,7 +631,7 @@ dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) ...@@ -601,7 +631,7 @@ dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
/* Replace REG in INSN with a V2DI subreg of NEW_REG. */ /* Replace REG in INSN with a V2DI subreg of NEW_REG. */
void void
dimode_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn, general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
rtx reg, rtx new_reg) rtx reg, rtx new_reg)
{ {
replace_with_subreg (single_set (insn), reg, new_reg); replace_with_subreg (single_set (insn), reg, new_reg);
...@@ -632,10 +662,10 @@ scalar_chain::emit_conversion_insns (rtx insns, rtx_insn *after) ...@@ -632,10 +662,10 @@ scalar_chain::emit_conversion_insns (rtx insns, rtx_insn *after)
and replace its uses in a chain. */ and replace its uses in a chain. */
void void
dimode_scalar_chain::make_vector_copies (unsigned regno) general_scalar_chain::make_vector_copies (unsigned regno)
{ {
rtx reg = regno_reg_rtx[regno]; rtx reg = regno_reg_rtx[regno];
rtx vreg = gen_reg_rtx (DImode); rtx vreg = gen_reg_rtx (smode);
df_ref ref; df_ref ref;
for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
...@@ -644,37 +674,59 @@ dimode_scalar_chain::make_vector_copies (unsigned regno) ...@@ -644,37 +674,59 @@ dimode_scalar_chain::make_vector_copies (unsigned regno)
start_sequence (); start_sequence ();
if (!TARGET_INTER_UNIT_MOVES_TO_VEC) if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
{ {
rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP); rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
emit_move_insn (adjust_address (tmp, SImode, 0), if (smode == DImode && !TARGET_64BIT)
gen_rtx_SUBREG (SImode, reg, 0)); {
emit_move_insn (adjust_address (tmp, SImode, 4), emit_move_insn (adjust_address (tmp, SImode, 0),
gen_rtx_SUBREG (SImode, reg, 4)); gen_rtx_SUBREG (SImode, reg, 0));
emit_move_insn (vreg, tmp); emit_move_insn (adjust_address (tmp, SImode, 4),
gen_rtx_SUBREG (SImode, reg, 4));
}
else
emit_move_insn (tmp, reg);
emit_insn (gen_rtx_SET
(gen_rtx_SUBREG (vmode, vreg, 0),
gen_rtx_VEC_MERGE (vmode,
gen_rtx_VEC_DUPLICATE (vmode,
tmp),
CONST0_RTX (vmode),
GEN_INT (HOST_WIDE_INT_1U))));
} }
else if (TARGET_SSE4_1) else if (!TARGET_64BIT && smode == DImode)
{ {
emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0), if (TARGET_SSE4_1)
CONST0_RTX (V4SImode), {
gen_rtx_SUBREG (SImode, reg, 0))); emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0), CONST0_RTX (V4SImode),
gen_rtx_SUBREG (V4SImode, vreg, 0), gen_rtx_SUBREG (SImode, reg, 0)));
gen_rtx_SUBREG (SImode, reg, 4), emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
GEN_INT (2))); gen_rtx_SUBREG (V4SImode, vreg, 0),
gen_rtx_SUBREG (SImode, reg, 4),
GEN_INT (2)));
}
else
{
rtx tmp = gen_reg_rtx (DImode);
emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
CONST0_RTX (V4SImode),
gen_rtx_SUBREG (SImode, reg, 0)));
emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
CONST0_RTX (V4SImode),
gen_rtx_SUBREG (SImode, reg, 4)));
emit_insn (gen_vec_interleave_lowv4si
(gen_rtx_SUBREG (V4SImode, vreg, 0),
gen_rtx_SUBREG (V4SImode, vreg, 0),
gen_rtx_SUBREG (V4SImode, tmp, 0)));
}
} }
else else
{ emit_insn (gen_rtx_SET
rtx tmp = gen_reg_rtx (DImode); (gen_rtx_SUBREG (vmode, vreg, 0),
emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0), gen_rtx_VEC_MERGE (vmode,
CONST0_RTX (V4SImode), gen_rtx_VEC_DUPLICATE (vmode,
gen_rtx_SUBREG (SImode, reg, 0))); reg),
emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0), CONST0_RTX (vmode),
CONST0_RTX (V4SImode), GEN_INT (HOST_WIDE_INT_1U))));
gen_rtx_SUBREG (SImode, reg, 4)));
emit_insn (gen_vec_interleave_lowv4si
(gen_rtx_SUBREG (V4SImode, vreg, 0),
gen_rtx_SUBREG (V4SImode, vreg, 0),
gen_rtx_SUBREG (V4SImode, tmp, 0)));
}
rtx_insn *seq = get_insns (); rtx_insn *seq = get_insns ();
end_sequence (); end_sequence ();
rtx_insn *insn = DF_REF_INSN (ref); rtx_insn *insn = DF_REF_INSN (ref);
...@@ -703,7 +755,7 @@ dimode_scalar_chain::make_vector_copies (unsigned regno) ...@@ -703,7 +755,7 @@ dimode_scalar_chain::make_vector_copies (unsigned regno)
in case register is used in not convertible insn. */ in case register is used in not convertible insn. */
void void
dimode_scalar_chain::convert_reg (unsigned regno) general_scalar_chain::convert_reg (unsigned regno)
{ {
bool scalar_copy = bitmap_bit_p (defs_conv, regno); bool scalar_copy = bitmap_bit_p (defs_conv, regno);
rtx reg = regno_reg_rtx[regno]; rtx reg = regno_reg_rtx[regno];
...@@ -715,7 +767,7 @@ dimode_scalar_chain::convert_reg (unsigned regno) ...@@ -715,7 +767,7 @@ dimode_scalar_chain::convert_reg (unsigned regno)
bitmap_copy (conv, insns); bitmap_copy (conv, insns);
if (scalar_copy) if (scalar_copy)
scopy = gen_reg_rtx (DImode); scopy = gen_reg_rtx (smode);
for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
{ {
...@@ -735,40 +787,55 @@ dimode_scalar_chain::convert_reg (unsigned regno) ...@@ -735,40 +787,55 @@ dimode_scalar_chain::convert_reg (unsigned regno)
start_sequence (); start_sequence ();
if (!TARGET_INTER_UNIT_MOVES_FROM_VEC) if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
{ {
rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP); rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
emit_move_insn (tmp, reg); emit_move_insn (tmp, reg);
emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), if (!TARGET_64BIT && smode == DImode)
adjust_address (tmp, SImode, 0)); {
emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
adjust_address (tmp, SImode, 4)); adjust_address (tmp, SImode, 0));
emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
adjust_address (tmp, SImode, 4));
}
else
emit_move_insn (scopy, tmp);
} }
else if (TARGET_SSE4_1) else if (!TARGET_64BIT && smode == DImode)
{ {
rtx tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const0_rtx)); if (TARGET_SSE4_1)
emit_insn {
(gen_rtx_SET rtx tmp = gen_rtx_PARALLEL (VOIDmode,
(gen_rtx_SUBREG (SImode, scopy, 0), gen_rtvec (1, const0_rtx));
gen_rtx_VEC_SELECT (SImode, emit_insn
gen_rtx_SUBREG (V4SImode, reg, 0), tmp))); (gen_rtx_SET
(gen_rtx_SUBREG (SImode, scopy, 0),
tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx)); gen_rtx_VEC_SELECT (SImode,
emit_insn gen_rtx_SUBREG (V4SImode, reg, 0),
(gen_rtx_SET tmp)));
(gen_rtx_SUBREG (SImode, scopy, 4),
gen_rtx_VEC_SELECT (SImode, tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
gen_rtx_SUBREG (V4SImode, reg, 0), tmp))); emit_insn
(gen_rtx_SET
(gen_rtx_SUBREG (SImode, scopy, 4),
gen_rtx_VEC_SELECT (SImode,
gen_rtx_SUBREG (V4SImode, reg, 0),
tmp)));
}
else
{
rtx vcopy = gen_reg_rtx (V2DImode);
emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
gen_rtx_SUBREG (SImode, vcopy, 0));
emit_move_insn (vcopy,
gen_rtx_LSHIFTRT (V2DImode,
vcopy, GEN_INT (32)));
emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
gen_rtx_SUBREG (SImode, vcopy, 0));
}
} }
else else
{ emit_move_insn (scopy, reg);
rtx vcopy = gen_reg_rtx (V2DImode);
emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
gen_rtx_SUBREG (SImode, vcopy, 0));
emit_move_insn (vcopy,
gen_rtx_LSHIFTRT (V2DImode, vcopy, GEN_INT (32)));
emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
gen_rtx_SUBREG (SImode, vcopy, 0));
}
rtx_insn *seq = get_insns (); rtx_insn *seq = get_insns ();
end_sequence (); end_sequence ();
emit_conversion_insns (seq, insn); emit_conversion_insns (seq, insn);
...@@ -817,21 +884,21 @@ dimode_scalar_chain::convert_reg (unsigned regno) ...@@ -817,21 +884,21 @@ dimode_scalar_chain::convert_reg (unsigned regno)
registers conversion. */ registers conversion. */
void void
dimode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
{ {
*op = copy_rtx_if_shared (*op); *op = copy_rtx_if_shared (*op);
if (GET_CODE (*op) == NOT) if (GET_CODE (*op) == NOT)
{ {
convert_op (&XEXP (*op, 0), insn); convert_op (&XEXP (*op, 0), insn);
PUT_MODE (*op, V2DImode); PUT_MODE (*op, vmode);
} }
else if (MEM_P (*op)) else if (MEM_P (*op))
{ {
rtx tmp = gen_reg_rtx (DImode); rtx tmp = gen_reg_rtx (GET_MODE (*op));
emit_insn_before (gen_move_insn (tmp, *op), insn); emit_insn_before (gen_move_insn (tmp, *op), insn);
*op = gen_rtx_SUBREG (V2DImode, tmp, 0); *op = gen_rtx_SUBREG (vmode, tmp, 0);
if (dump_file) if (dump_file)
fprintf (dump_file, " Preloading operand for insn %d into r%d\n", fprintf (dump_file, " Preloading operand for insn %d into r%d\n",
...@@ -849,24 +916,30 @@ dimode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) ...@@ -849,24 +916,30 @@ dimode_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
gcc_assert (!DF_REF_CHAIN (ref)); gcc_assert (!DF_REF_CHAIN (ref));
break; break;
} }
*op = gen_rtx_SUBREG (V2DImode, *op, 0); *op = gen_rtx_SUBREG (vmode, *op, 0);
} }
else if (CONST_INT_P (*op)) else if (CONST_INT_P (*op))
{ {
rtx vec_cst; rtx vec_cst;
rtx tmp = gen_rtx_SUBREG (V2DImode, gen_reg_rtx (DImode), 0); rtx tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (smode), 0);
/* Prefer all ones vector in case of -1. */ /* Prefer all ones vector in case of -1. */
if (constm1_operand (*op, GET_MODE (*op))) if (constm1_operand (*op, GET_MODE (*op)))
vec_cst = CONSTM1_RTX (V2DImode); vec_cst = CONSTM1_RTX (vmode);
else else
vec_cst = gen_rtx_CONST_VECTOR (V2DImode, {
gen_rtvec (2, *op, const0_rtx)); unsigned n = GET_MODE_NUNITS (vmode);
rtx *v = XALLOCAVEC (rtx, n);
v[0] = *op;
for (unsigned i = 1; i < n; ++i)
v[i] = const0_rtx;
vec_cst = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
}
if (!standard_sse_constant_p (vec_cst, V2DImode)) if (!standard_sse_constant_p (vec_cst, vmode))
{ {
start_sequence (); start_sequence ();
vec_cst = validize_mem (force_const_mem (V2DImode, vec_cst)); vec_cst = validize_mem (force_const_mem (vmode, vec_cst));
rtx_insn *seq = get_insns (); rtx_insn *seq = get_insns ();
end_sequence (); end_sequence ();
emit_insn_before (seq, insn); emit_insn_before (seq, insn);
...@@ -878,14 +951,14 @@ dimode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) ...@@ -878,14 +951,14 @@ dimode_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
else else
{ {
gcc_assert (SUBREG_P (*op)); gcc_assert (SUBREG_P (*op));
gcc_assert (GET_MODE (*op) == V2DImode); gcc_assert (GET_MODE (*op) == vmode);
} }
} }
/* Convert INSN to vector mode. */ /* Convert INSN to vector mode. */
void void
dimode_scalar_chain::convert_insn (rtx_insn *insn) general_scalar_chain::convert_insn (rtx_insn *insn)
{ {
rtx def_set = single_set (insn); rtx def_set = single_set (insn);
rtx src = SET_SRC (def_set); rtx src = SET_SRC (def_set);
...@@ -896,9 +969,9 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn) ...@@ -896,9 +969,9 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
{ {
/* There are no scalar integer instructions and therefore /* There are no scalar integer instructions and therefore
temporary register usage is required. */ temporary register usage is required. */
rtx tmp = gen_reg_rtx (DImode); rtx tmp = gen_reg_rtx (smode);
emit_conversion_insns (gen_move_insn (dst, tmp), insn); emit_conversion_insns (gen_move_insn (dst, tmp), insn);
dst = gen_rtx_SUBREG (V2DImode, tmp, 0); dst = gen_rtx_SUBREG (vmode, tmp, 0);
} }
switch (GET_CODE (src)) switch (GET_CODE (src))
...@@ -907,7 +980,7 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn) ...@@ -907,7 +980,7 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
case ASHIFTRT: case ASHIFTRT:
case LSHIFTRT: case LSHIFTRT:
convert_op (&XEXP (src, 0), insn); convert_op (&XEXP (src, 0), insn);
PUT_MODE (src, V2DImode); PUT_MODE (src, vmode);
break; break;
case PLUS: case PLUS:
...@@ -915,25 +988,29 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn) ...@@ -915,25 +988,29 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
case IOR: case IOR:
case XOR: case XOR:
case AND: case AND:
case SMAX:
case SMIN:
case UMAX:
case UMIN:
convert_op (&XEXP (src, 0), insn); convert_op (&XEXP (src, 0), insn);
convert_op (&XEXP (src, 1), insn); convert_op (&XEXP (src, 1), insn);
PUT_MODE (src, V2DImode); PUT_MODE (src, vmode);
break; break;
case NEG: case NEG:
src = XEXP (src, 0); src = XEXP (src, 0);
convert_op (&src, insn); convert_op (&src, insn);
subreg = gen_reg_rtx (V2DImode); subreg = gen_reg_rtx (vmode);
emit_insn_before (gen_move_insn (subreg, CONST0_RTX (V2DImode)), insn); emit_insn_before (gen_move_insn (subreg, CONST0_RTX (vmode)), insn);
src = gen_rtx_MINUS (V2DImode, subreg, src); src = gen_rtx_MINUS (vmode, subreg, src);
break; break;
case NOT: case NOT:
src = XEXP (src, 0); src = XEXP (src, 0);
convert_op (&src, insn); convert_op (&src, insn);
subreg = gen_reg_rtx (V2DImode); subreg = gen_reg_rtx (vmode);
emit_insn_before (gen_move_insn (subreg, CONSTM1_RTX (V2DImode)), insn); emit_insn_before (gen_move_insn (subreg, CONSTM1_RTX (vmode)), insn);
src = gen_rtx_XOR (V2DImode, src, subreg); src = gen_rtx_XOR (vmode, src, subreg);
break; break;
case MEM: case MEM:
...@@ -947,17 +1024,17 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn) ...@@ -947,17 +1024,17 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
break; break;
case SUBREG: case SUBREG:
gcc_assert (GET_MODE (src) == V2DImode); gcc_assert (GET_MODE (src) == vmode);
break; break;
case COMPARE: case COMPARE:
src = SUBREG_REG (XEXP (XEXP (src, 0), 0)); src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
gcc_assert ((REG_P (src) && GET_MODE (src) == DImode) gcc_assert ((REG_P (src) && GET_MODE (src) == GET_MODE_INNER (vmode))
|| (SUBREG_P (src) && GET_MODE (src) == V2DImode)); || (SUBREG_P (src) && GET_MODE (src) == vmode));
if (REG_P (src)) if (REG_P (src))
subreg = gen_rtx_SUBREG (V2DImode, src, 0); subreg = gen_rtx_SUBREG (vmode, src, 0);
else else
subreg = copy_rtx_if_shared (src); subreg = copy_rtx_if_shared (src);
emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (subreg), emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (subreg),
...@@ -985,7 +1062,9 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn) ...@@ -985,7 +1062,9 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
PATTERN (insn) = def_set; PATTERN (insn) = def_set;
INSN_CODE (insn) = -1; INSN_CODE (insn) = -1;
recog_memoized (insn); int patt = recog_memoized (insn);
if (patt == -1)
fatal_insn_not_found (insn);
df_insn_rescan (insn); df_insn_rescan (insn);
} }
...@@ -1124,7 +1203,7 @@ timode_scalar_chain::convert_insn (rtx_insn *insn) ...@@ -1124,7 +1203,7 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
} }
void void
dimode_scalar_chain::convert_registers () general_scalar_chain::convert_registers ()
{ {
bitmap_iterator bi; bitmap_iterator bi;
unsigned id; unsigned id;
...@@ -1194,7 +1273,7 @@ has_non_address_hard_reg (rtx_insn *insn) ...@@ -1194,7 +1273,7 @@ has_non_address_hard_reg (rtx_insn *insn)
(const_int 0 [0]))) */ (const_int 0 [0]))) */
static bool static bool
convertible_comparison_p (rtx_insn *insn) convertible_comparison_p (rtx_insn *insn, enum machine_mode mode)
{ {
if (!TARGET_SSE4_1) if (!TARGET_SSE4_1)
return false; return false;
...@@ -1227,12 +1306,12 @@ convertible_comparison_p (rtx_insn *insn) ...@@ -1227,12 +1306,12 @@ convertible_comparison_p (rtx_insn *insn)
if (!SUBREG_P (op1) if (!SUBREG_P (op1)
|| !SUBREG_P (op2) || !SUBREG_P (op2)
|| GET_MODE (op1) != SImode || GET_MODE (op1) != mode
|| GET_MODE (op2) != SImode || GET_MODE (op2) != mode
|| ((SUBREG_BYTE (op1) != 0 || ((SUBREG_BYTE (op1) != 0
|| SUBREG_BYTE (op2) != GET_MODE_SIZE (SImode)) || SUBREG_BYTE (op2) != GET_MODE_SIZE (mode))
&& (SUBREG_BYTE (op2) != 0 && (SUBREG_BYTE (op2) != 0
|| SUBREG_BYTE (op1) != GET_MODE_SIZE (SImode)))) || SUBREG_BYTE (op1) != GET_MODE_SIZE (mode))))
return false; return false;
op1 = SUBREG_REG (op1); op1 = SUBREG_REG (op1);
...@@ -1240,7 +1319,7 @@ convertible_comparison_p (rtx_insn *insn) ...@@ -1240,7 +1319,7 @@ convertible_comparison_p (rtx_insn *insn)
if (op1 != op2 if (op1 != op2
|| !REG_P (op1) || !REG_P (op1)
|| GET_MODE (op1) != DImode) || GET_MODE (op1) != GET_MODE_WIDER_MODE (mode).else_blk ())
return false; return false;
return true; return true;
...@@ -1249,7 +1328,7 @@ convertible_comparison_p (rtx_insn *insn) ...@@ -1249,7 +1328,7 @@ convertible_comparison_p (rtx_insn *insn)
/* The DImode version of scalar_to_vector_candidate_p. */ /* The DImode version of scalar_to_vector_candidate_p. */
static bool static bool
dimode_scalar_to_vector_candidate_p (rtx_insn *insn) general_scalar_to_vector_candidate_p (rtx_insn *insn, enum machine_mode mode)
{ {
rtx def_set = single_set (insn); rtx def_set = single_set (insn);
...@@ -1263,12 +1342,12 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn) ...@@ -1263,12 +1342,12 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn)
rtx dst = SET_DEST (def_set); rtx dst = SET_DEST (def_set);
if (GET_CODE (src) == COMPARE) if (GET_CODE (src) == COMPARE)
return convertible_comparison_p (insn); return convertible_comparison_p (insn, mode);
/* We are interested in DImode promotion only. */ /* We are interested in DImode promotion only. */
if ((GET_MODE (src) != DImode if ((GET_MODE (src) != mode
&& !CONST_INT_P (src)) && !CONST_INT_P (src))
|| GET_MODE (dst) != DImode) || GET_MODE (dst) != mode)
return false; return false;
if (!REG_P (dst) && !MEM_P (dst)) if (!REG_P (dst) && !MEM_P (dst))
...@@ -1288,6 +1367,15 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn) ...@@ -1288,6 +1367,15 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn)
return false; return false;
break; break;
case SMAX:
case SMIN:
case UMAX:
case UMIN:
if ((mode == DImode && !TARGET_AVX512VL)
|| (mode == SImode && !TARGET_SSE4_1))
return false;
/* Fallthru. */
case PLUS: case PLUS:
case MINUS: case MINUS:
case IOR: case IOR:
...@@ -1298,7 +1386,7 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn) ...@@ -1298,7 +1386,7 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn)
&& !CONST_INT_P (XEXP (src, 1))) && !CONST_INT_P (XEXP (src, 1)))
return false; return false;
if (GET_MODE (XEXP (src, 1)) != DImode if (GET_MODE (XEXP (src, 1)) != mode
&& !CONST_INT_P (XEXP (src, 1))) && !CONST_INT_P (XEXP (src, 1)))
return false; return false;
break; break;
...@@ -1327,7 +1415,7 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn) ...@@ -1327,7 +1415,7 @@ dimode_scalar_to_vector_candidate_p (rtx_insn *insn)
|| !REG_P (XEXP (XEXP (src, 0), 0)))) || !REG_P (XEXP (XEXP (src, 0), 0))))
return false; return false;
if (GET_MODE (XEXP (src, 0)) != DImode if (GET_MODE (XEXP (src, 0)) != mode
&& !CONST_INT_P (XEXP (src, 0))) && !CONST_INT_P (XEXP (src, 0)))
return false; return false;
...@@ -1391,22 +1479,16 @@ timode_scalar_to_vector_candidate_p (rtx_insn *insn) ...@@ -1391,22 +1479,16 @@ timode_scalar_to_vector_candidate_p (rtx_insn *insn)
return false; return false;
} }
/* Return 1 if INSN may be converted into vector /* For a given bitmap of insn UIDs scans all instruction and
instruction. */ remove insn from CANDIDATES in case it has both convertible
and not convertible definitions.
static bool
scalar_to_vector_candidate_p (rtx_insn *insn)
{
if (TARGET_64BIT)
return timode_scalar_to_vector_candidate_p (insn);
else
return dimode_scalar_to_vector_candidate_p (insn);
}
/* The DImode version of remove_non_convertible_regs. */ All insns in a bitmap are conversion candidates according to
scalar_to_vector_candidate_p. Currently it implies all insns
are single_set. */
static void static void
dimode_remove_non_convertible_regs (bitmap candidates) general_remove_non_convertible_regs (bitmap candidates)
{ {
bitmap_iterator bi; bitmap_iterator bi;
unsigned id; unsigned id;
...@@ -1561,23 +1643,6 @@ timode_remove_non_convertible_regs (bitmap candidates) ...@@ -1561,23 +1643,6 @@ timode_remove_non_convertible_regs (bitmap candidates)
BITMAP_FREE (regs); BITMAP_FREE (regs);
} }
/* For a given bitmap of insn UIDs scans all instruction and
remove insn from CANDIDATES in case it has both convertible
and not convertible definitions.
All insns in a bitmap are conversion candidates according to
scalar_to_vector_candidate_p. Currently it implies all insns
are single_set. */
static void
remove_non_convertible_regs (bitmap candidates)
{
if (TARGET_64BIT)
timode_remove_non_convertible_regs (candidates);
else
dimode_remove_non_convertible_regs (candidates);
}
/* Main STV pass function. Find and convert scalar /* Main STV pass function. Find and convert scalar
instructions into vector mode when profitable. */ instructions into vector mode when profitable. */
...@@ -1585,11 +1650,14 @@ static unsigned int ...@@ -1585,11 +1650,14 @@ static unsigned int
convert_scalars_to_vector () convert_scalars_to_vector ()
{ {
basic_block bb; basic_block bb;
bitmap candidates;
int converted_insns = 0; int converted_insns = 0;
bitmap_obstack_initialize (NULL); bitmap_obstack_initialize (NULL);
candidates = BITMAP_ALLOC (NULL); const machine_mode cand_mode[3] = { SImode, DImode, TImode };
const machine_mode cand_vmode[3] = { V4SImode, V2DImode, V1TImode };
bitmap_head candidates[3]; /* { SImode, DImode, TImode } */
for (unsigned i = 0; i < 3; ++i)
bitmap_initialize (&candidates[i], &bitmap_default_obstack);
calculate_dominance_info (CDI_DOMINATORS); calculate_dominance_info (CDI_DOMINATORS);
df_set_flags (DF_DEFER_INSN_RESCAN); df_set_flags (DF_DEFER_INSN_RESCAN);
...@@ -1605,51 +1673,73 @@ convert_scalars_to_vector () ...@@ -1605,51 +1673,73 @@ convert_scalars_to_vector ()
{ {
rtx_insn *insn; rtx_insn *insn;
FOR_BB_INSNS (bb, insn) FOR_BB_INSNS (bb, insn)
if (scalar_to_vector_candidate_p (insn)) if (TARGET_64BIT
&& timode_scalar_to_vector_candidate_p (insn))
{ {
if (dump_file) if (dump_file)
fprintf (dump_file, " insn %d is marked as a candidate\n", fprintf (dump_file, " insn %d is marked as a TImode candidate\n",
INSN_UID (insn)); INSN_UID (insn));
bitmap_set_bit (candidates, INSN_UID (insn)); bitmap_set_bit (&candidates[2], INSN_UID (insn));
}
else
{
/* Check {SI,DI}mode. */
for (unsigned i = 0; i <= 1; ++i)
if (general_scalar_to_vector_candidate_p (insn, cand_mode[i]))
{
if (dump_file)
fprintf (dump_file, " insn %d is marked as a %s candidate\n",
INSN_UID (insn), i == 0 ? "SImode" : "DImode");
bitmap_set_bit (&candidates[i], INSN_UID (insn));
break;
}
} }
} }
remove_non_convertible_regs (candidates); if (TARGET_64BIT)
timode_remove_non_convertible_regs (&candidates[2]);
for (unsigned i = 0; i <= 1; ++i)
general_remove_non_convertible_regs (&candidates[i]);
if (bitmap_empty_p (candidates)) for (unsigned i = 0; i <= 2; ++i)
if (dump_file) if (!bitmap_empty_p (&candidates[i]))
break;
else if (i == 2 && dump_file)
fprintf (dump_file, "There are no candidates for optimization.\n"); fprintf (dump_file, "There are no candidates for optimization.\n");
while (!bitmap_empty_p (candidates)) for (unsigned i = 0; i <= 2; ++i)
{ while (!bitmap_empty_p (&candidates[i]))
unsigned uid = bitmap_first_set_bit (candidates); {
scalar_chain *chain; unsigned uid = bitmap_first_set_bit (&candidates[i]);
scalar_chain *chain;
if (TARGET_64BIT) if (cand_mode[i] == TImode)
chain = new timode_scalar_chain; chain = new timode_scalar_chain;
else else
chain = new dimode_scalar_chain; chain = new general_scalar_chain (cand_mode[i], cand_vmode[i]);
/* Find instructions chain we want to convert to vector mode. /* Find instructions chain we want to convert to vector mode.
Check all uses and definitions to estimate all required Check all uses and definitions to estimate all required
conversions. */ conversions. */
chain->build (candidates, uid); chain->build (&candidates[i], uid);
if (chain->compute_convert_gain () > 0) if (chain->compute_convert_gain () > 0)
converted_insns += chain->convert (); converted_insns += chain->convert ();
else else
if (dump_file) if (dump_file)
fprintf (dump_file, "Chain #%d conversion is not profitable\n", fprintf (dump_file, "Chain #%d conversion is not profitable\n",
chain->chain_id); chain->chain_id);
delete chain; delete chain;
} }
if (dump_file) if (dump_file)
fprintf (dump_file, "Total insns converted: %d\n", converted_insns); fprintf (dump_file, "Total insns converted: %d\n", converted_insns);
BITMAP_FREE (candidates); for (unsigned i = 0; i <= 2; ++i)
bitmap_release (&candidates[i]);
bitmap_obstack_release (NULL); bitmap_obstack_release (NULL);
df_process_deferred_rescans (); df_process_deferred_rescans ();
......
...@@ -127,11 +127,16 @@ namespace { ...@@ -127,11 +127,16 @@ namespace {
class scalar_chain class scalar_chain
{ {
public: public:
scalar_chain (); scalar_chain (enum machine_mode, enum machine_mode);
virtual ~scalar_chain (); virtual ~scalar_chain ();
static unsigned max_id; static unsigned max_id;
/* Scalar mode. */
enum machine_mode smode;
/* Vector mode. */
enum machine_mode vmode;
/* ID of a chain. */ /* ID of a chain. */
unsigned int chain_id; unsigned int chain_id;
/* A queue of instructions to be included into a chain. */ /* A queue of instructions to be included into a chain. */
...@@ -159,9 +164,11 @@ class scalar_chain ...@@ -159,9 +164,11 @@ class scalar_chain
virtual void convert_registers () = 0; virtual void convert_registers () = 0;
}; };
class dimode_scalar_chain : public scalar_chain class general_scalar_chain : public scalar_chain
{ {
public: public:
general_scalar_chain (enum machine_mode smode_, enum machine_mode vmode_)
: scalar_chain (smode_, vmode_) {}
int compute_convert_gain (); int compute_convert_gain ();
private: private:
void mark_dual_mode_def (df_ref def); void mark_dual_mode_def (df_ref def);
...@@ -178,6 +185,8 @@ class dimode_scalar_chain : public scalar_chain ...@@ -178,6 +185,8 @@ class dimode_scalar_chain : public scalar_chain
class timode_scalar_chain : public scalar_chain class timode_scalar_chain : public scalar_chain
{ {
public: public:
timode_scalar_chain () : scalar_chain (TImode, V1TImode) {}
/* Convert from TImode to V1TImode is always faster. */ /* Convert from TImode to V1TImode is always faster. */
int compute_convert_gain () { return 1; } int compute_convert_gain () { return 1; }
......
...@@ -17719,6 +17719,110 @@ ...@@ -17719,6 +17719,110 @@
(match_operand:SWI 3 "const_int_operand")] (match_operand:SWI 3 "const_int_operand")]
"" ""
"if (ix86_expand_int_addcc (operands)) DONE; else FAIL;") "if (ix86_expand_int_addcc (operands)) DONE; else FAIL;")
;; min/max patterns
(define_mode_iterator MAXMIN_IMODE
[(SI "TARGET_SSE4_1") (DI "TARGET_AVX512VL")])
(define_code_attr maxmin_rel
[(smax "GE") (smin "LE") (umax "GEU") (umin "LEU")])
(define_expand "<code><mode>3"
[(parallel
[(set (match_operand:MAXMIN_IMODE 0 "register_operand")
(maxmin:MAXMIN_IMODE
(match_operand:MAXMIN_IMODE 1 "register_operand")
(match_operand:MAXMIN_IMODE 2 "nonimmediate_operand")))
(clobber (reg:CC FLAGS_REG))])]
"TARGET_STV")
(define_insn_and_split "*<code><mode>3_1"
[(set (match_operand:MAXMIN_IMODE 0 "register_operand")
(maxmin:MAXMIN_IMODE
(match_operand:MAXMIN_IMODE 1 "register_operand")
(match_operand:MAXMIN_IMODE 2 "nonimmediate_operand")))
(clobber (reg:CC FLAGS_REG))]
"(TARGET_64BIT || <MODE>mode != DImode) && TARGET_STV
&& can_create_pseudo_p ()"
"#"
"&& 1"
[(set (match_dup 0)
(if_then_else:MAXMIN_IMODE (match_dup 3)
(match_dup 1)
(match_dup 2)))]
{
machine_mode mode = <MODE>mode;
if (!register_operand (operands[2], mode))
operands[2] = force_reg (mode, operands[2]);
enum rtx_code code = <maxmin_rel>;
machine_mode cmpmode = SELECT_CC_MODE (code, operands[1], operands[2]);
rtx flags = gen_rtx_REG (cmpmode, FLAGS_REG);
rtx tmp = gen_rtx_COMPARE (cmpmode, operands[1], operands[2]);
emit_insn (gen_rtx_SET (flags, tmp));
operands[3] = gen_rtx_fmt_ee (code, VOIDmode, flags, const0_rtx);
})
(define_insn_and_split "*<code>di3_doubleword"
[(set (match_operand:DI 0 "register_operand")
(maxmin:DI (match_operand:DI 1 "register_operand")
(match_operand:DI 2 "nonimmediate_operand")))
(clobber (reg:CC FLAGS_REG))]
"!TARGET_64BIT && TARGET_STV && TARGET_AVX512VL
&& can_create_pseudo_p ()"
"#"
"&& 1"
[(set (match_dup 0)
(if_then_else:SI (match_dup 6)
(match_dup 1)
(match_dup 2)))
(set (match_dup 3)
(if_then_else:SI (match_dup 6)
(match_dup 4)
(match_dup 5)))]
{
if (!register_operand (operands[2], DImode))
operands[2] = force_reg (DImode, operands[2]);
split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);
rtx cmplo[2] = { operands[1], operands[2] };
rtx cmphi[2] = { operands[4], operands[5] };
enum rtx_code code = <maxmin_rel>;
switch (code)
{
case LE: case LEU:
std::swap (cmplo[0], cmplo[1]);
std::swap (cmphi[0], cmphi[1]);
code = swap_condition (code);
/* FALLTHRU */
case GE: case GEU:
{
bool uns = (code == GEU);
rtx (*sbb_insn) (machine_mode, rtx, rtx, rtx)
= uns ? gen_sub3_carry_ccc : gen_sub3_carry_ccgz;
emit_insn (gen_cmp_1 (SImode, cmplo[0], cmplo[1]));
rtx tmp = gen_rtx_SCRATCH (SImode);
emit_insn (sbb_insn (SImode, tmp, cmphi[0], cmphi[1]));
rtx flags = gen_rtx_REG (uns ? CCCmode : CCGZmode, FLAGS_REG);
operands[6] = gen_rtx_fmt_ee (code, VOIDmode, flags, const0_rtx);
break;
}
default:
gcc_unreachable ();
}
})
;; Misc patterns (?) ;; Misc patterns (?)
......
2019-08-14 Richard Biener <rguenther@suse.de>
PR target/91154
* gcc.target/i386/pr91154.c: New testcase.
* gcc.target/i386/minmax-3.c: Likewise.
* gcc.target/i386/minmax-4.c: Likewise.
* gcc.target/i386/minmax-5.c: Likewise.
* gcc.target/i386/minmax-6.c: Likewise.
* gcc.target/i386/minmax-1.c: Add -mno-stv.
* gcc.target/i386/minmax-2.c: Likewise.
2019-08-14 Richard Sandiford <richard.sandiford@arm.com> 2019-08-14 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
......
/* { dg-do compile } */ /* { dg-do compile } */
/* { dg-options "-O2 -march=opteron" } */ /* { dg-options "-O2 -march=opteron -mno-stv" } */
/* { dg-final { scan-assembler "test" } } */ /* { dg-final { scan-assembler "test" } } */
/* { dg-final { scan-assembler-not "cmp" } } */ /* { dg-final { scan-assembler-not "cmp" } } */
#define max(a,b) (((a) > (b))? (a) : (b)) #define max(a,b) (((a) > (b))? (a) : (b))
......
/* { dg-do compile } */ /* { dg-do compile } */
/* { dg-options "-O2" } */ /* { dg-options "-O2 -mno-stv" } */
/* { dg-final { scan-assembler "test" } } */ /* { dg-final { scan-assembler "test" } } */
/* { dg-final { scan-assembler-not "cmp" } } */ /* { dg-final { scan-assembler-not "cmp" } } */
#define max(a,b) (((a) > (b))? (a) : (b)) #define max(a,b) (((a) > (b))? (a) : (b))
......
/* { dg-do compile } */
/* { dg-options "-O2 -mstv" } */
#define max(a,b) (((a) > (b))? (a) : (b))
#define min(a,b) (((a) < (b))? (a) : (b))
int ssi[1024];
unsigned int usi[1024];
long long sdi[1024];
unsigned long long udi[1024];
#define CHECK(FN, VARIANT) \
void \
FN ## VARIANT (void) \
{ \
for (int i = 1; i < 1024; ++i) \
VARIANT[i] = FN(VARIANT[i-1], VARIANT[i]); \
}
CHECK(max, ssi);
CHECK(min, ssi);
CHECK(max, usi);
CHECK(min, usi);
CHECK(max, sdi);
CHECK(min, sdi);
CHECK(max, udi);
CHECK(min, udi);
/* { dg-do compile } */
/* { dg-options "-O2 -mstv -msse4.1" } */
#include "minmax-3.c"
/* { dg-final { scan-assembler-times "pmaxsd" 1 } } */
/* { dg-final { scan-assembler-times "pmaxud" 1 } } */
/* { dg-final { scan-assembler-times "pminsd" 1 } } */
/* { dg-final { scan-assembler-times "pminud" 1 } } */
/* { dg-do compile } */
/* { dg-options "-O2 -mstv -mavx512vl" } */
#include "minmax-3.c"
/* { dg-final { scan-assembler-times "vpmaxsd" 1 } } */
/* { dg-final { scan-assembler-times "vpmaxud" 1 } } */
/* { dg-final { scan-assembler-times "vpminsd" 1 } } */
/* { dg-final { scan-assembler-times "vpminud" 1 } } */
/* { dg-final { scan-assembler-times "vpmaxsq" 1 { target lp64 } } } */
/* { dg-final { scan-assembler-times "vpmaxuq" 1 { target lp64 } } } */
/* { dg-final { scan-assembler-times "vpminsq" 1 { target lp64 } } } */
/* { dg-final { scan-assembler-times "vpminuq" 1 { target lp64 } } } */
/* { dg-do compile } */
/* { dg-options "-O2 -march=haswell" } */
unsigned short
UMVLine16Y_11 (short unsigned int * Pic, int y, int width)
{
if (y != width)
{
y = y < 0 ? 0 : y;
return Pic[y * width];
}
return Pic[y];
}
/* We do not want the RA to spill %esi for it's dual-use but using
pmaxsd is OK. */
/* { dg-final { scan-assembler-not "rsp" { target { ! { ia32 } } } } } */
/* { dg-final { scan-assembler "pmaxsd" } } */
/* { dg-do compile } */
/* { dg-options "-O2 -msse4.1 -mstv" } */
void foo (int *dc, int *mc, int *tpdd, int *tpmd, int M)
{
int sc;
int k;
for (k = 1; k <= M; k++)
{
dc[k] = dc[k-1] + tpdd[k-1];
if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc;
if (dc[k] < -987654321) dc[k] = -987654321;
}
}
/* We want to convert the loop to SSE since SSE pmaxsd is faster than
compare + conditional move. */
/* { dg-final { scan-assembler-not "cmov" } } */
/* { dg-final { scan-assembler-times "pmaxsd" 2 } } */
/* { dg-final { scan-assembler-times "paddd" 2 } } */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment