Commit 0972596e by Richard Sandiford Committed by Richard Sandiford

Add support for reductions in fully-masked loops

This patch removes the restriction that fully-masked loops cannot
have reductions.  The key thing here is to make sure that the
reduction accumulator doesn't include any values associated with
inactive lanes; the patch adds a bunch of conditional binary
operations for doing that.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (cond_add@var{mode}, cond_sub@var{mode})
	(cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode})
	(cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode})
	(cond_umax@var{mode}): Document.
	* optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab)
	(cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab)
	(cond_umin_optab, cond_umax_optab): New optabs.
	* internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND)
	(COND_IOR, COND_XOR): New internal functions.
	* internal-fn.h (get_conditional_internal_fn): Declare.
	* internal-fn.c (cond_binary_direct): New macro.
	(expand_cond_binary_optab_fn): Likewise.
	(direct_cond_binary_optab_supported_p): Likewise.
	(get_conditional_internal_fn): New function.
	* tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops.
	Cope with reduction statements that are vectorized as calls rather
	than assignments.
	* config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns.
	* config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB)
	(UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN)
	(UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR)
	(UNSPEC_COND_EOR): New unspecs.
	(optab): Add mappings for them.
	(SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators.
	(sve_int_op, sve_fp_op): New int attributes.

gcc/testsuite/
	* gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors.
	* gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations
	to be predicated.
	* gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop.
	* gcc.target/aarch64/sve/slp_7.c: Likewise.
	* gcc.target/aarch64/sve/reduc_5.c: New test.
	* gcc.target/aarch64/sve/slp_13.c: Likewise.
	* gcc.target/aarch64/sve/slp_13_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256626
parent 7cfb4d93
...@@ -2,6 +2,36 @@ ...@@ -2,6 +2,36 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* doc/md.texi (cond_add@var{mode}, cond_sub@var{mode})
(cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode})
(cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode})
(cond_umax@var{mode}): Document.
* optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab)
(cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab)
(cond_umin_optab, cond_umax_optab): New optabs.
* internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND)
(COND_IOR, COND_XOR): New internal functions.
* internal-fn.h (get_conditional_internal_fn): Declare.
* internal-fn.c (cond_binary_direct): New macro.
(expand_cond_binary_optab_fn): Likewise.
(direct_cond_binary_optab_supported_p): Likewise.
(get_conditional_internal_fn): New function.
* tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops.
Cope with reduction statements that are vectorized as calls rather
than assignments.
* config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns.
* config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB)
(UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN)
(UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR)
(UNSPEC_COND_EOR): New unspecs.
(optab): Add mappings for them.
(SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators.
(sve_int_op, sve_fp_op): New int attributes.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* optabs.def (while_ult_optab): New optab. * optabs.def (while_ult_optab): New optab.
* doc/md.texi (while_ult@var{m}@var{n}): Document. * doc/md.texi (while_ult@var{m}@var{n}): Document.
* internal-fn.def (WHILE_ULT): New internal function. * internal-fn.def (WHILE_ULT): New internal function.
......
...@@ -1417,6 +1417,18 @@ ...@@ -1417,6 +1417,18 @@
"<maxmin_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>" "<maxmin_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
) )
;; Predicated integer operations.
(define_insn "cond_<optab><mode>"
[(set (match_operand:SVE_I 0 "register_operand" "=w")
(unspec:SVE_I
[(match_operand:<VPRED> 1 "register_operand" "Upl")
(match_operand:SVE_I 2 "register_operand" "0")
(match_operand:SVE_I 3 "register_operand" "w")]
SVE_COND_INT_OP))]
"TARGET_SVE"
"<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
)
;; Unpredicated integer add reduction. ;; Unpredicated integer add reduction.
(define_expand "reduc_plus_scal_<mode>" (define_expand "reduc_plus_scal_<mode>"
[(set (match_operand:<VEL> 0 "register_operand") [(set (match_operand:<VEL> 0 "register_operand")
...@@ -2094,6 +2106,18 @@ ...@@ -2094,6 +2106,18 @@
} }
) )
;; Predicated floating-point operations.
(define_insn "cond_<optab><mode>"
[(set (match_operand:SVE_F 0 "register_operand" "=w")
(unspec:SVE_F
[(match_operand:<VPRED> 1 "register_operand" "Upl")
(match_operand:SVE_F 2 "register_operand" "0")
(match_operand:SVE_F 3 "register_operand" "w")]
SVE_COND_FP_OP))]
"TARGET_SVE"
"<sve_fp_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
)
;; Shift an SVE vector left and insert a scalar into element 0. ;; Shift an SVE vector left and insert a scalar into element 0.
(define_insn "vec_shl_insert_<mode>" (define_insn "vec_shl_insert_<mode>"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w, w") [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
......
...@@ -432,6 +432,15 @@ ...@@ -432,6 +432,15 @@
UNSPEC_ANDF ; Used in aarch64-sve.md. UNSPEC_ANDF ; Used in aarch64-sve.md.
UNSPEC_IORF ; Used in aarch64-sve.md. UNSPEC_IORF ; Used in aarch64-sve.md.
UNSPEC_XORF ; Used in aarch64-sve.md. UNSPEC_XORF ; Used in aarch64-sve.md.
UNSPEC_COND_ADD ; Used in aarch64-sve.md.
UNSPEC_COND_SUB ; Used in aarch64-sve.md.
UNSPEC_COND_SMAX ; Used in aarch64-sve.md.
UNSPEC_COND_UMAX ; Used in aarch64-sve.md.
UNSPEC_COND_SMIN ; Used in aarch64-sve.md.
UNSPEC_COND_UMIN ; Used in aarch64-sve.md.
UNSPEC_COND_AND ; Used in aarch64-sve.md.
UNSPEC_COND_ORR ; Used in aarch64-sve.md.
UNSPEC_COND_EOR ; Used in aarch64-sve.md.
UNSPEC_COND_LT ; Used in aarch64-sve.md. UNSPEC_COND_LT ; Used in aarch64-sve.md.
UNSPEC_COND_LE ; Used in aarch64-sve.md. UNSPEC_COND_LE ; Used in aarch64-sve.md.
UNSPEC_COND_EQ ; Used in aarch64-sve.md. UNSPEC_COND_EQ ; Used in aarch64-sve.md.
...@@ -1452,6 +1461,15 @@ ...@@ -1452,6 +1461,15 @@
(define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI]) (define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI])
(define_int_iterator SVE_COND_INT_OP [UNSPEC_COND_ADD UNSPEC_COND_SUB
UNSPEC_COND_SMAX UNSPEC_COND_UMAX
UNSPEC_COND_SMIN UNSPEC_COND_UMIN
UNSPEC_COND_AND
UNSPEC_COND_ORR
UNSPEC_COND_EOR])
(define_int_iterator SVE_COND_FP_OP [UNSPEC_COND_ADD UNSPEC_COND_SUB])
(define_int_iterator SVE_COND_INT_CMP [UNSPEC_COND_LT UNSPEC_COND_LE (define_int_iterator SVE_COND_INT_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
UNSPEC_COND_EQ UNSPEC_COND_NE UNSPEC_COND_EQ UNSPEC_COND_NE
UNSPEC_COND_GE UNSPEC_COND_GT UNSPEC_COND_GE UNSPEC_COND_GT
...@@ -1484,7 +1502,16 @@ ...@@ -1484,7 +1502,16 @@
(UNSPEC_XORF "xor") (UNSPEC_XORF "xor")
(UNSPEC_ANDV "and") (UNSPEC_ANDV "and")
(UNSPEC_IORV "ior") (UNSPEC_IORV "ior")
(UNSPEC_XORV "xor")]) (UNSPEC_XORV "xor")
(UNSPEC_COND_ADD "add")
(UNSPEC_COND_SUB "sub")
(UNSPEC_COND_SMAX "smax")
(UNSPEC_COND_UMAX "umax")
(UNSPEC_COND_SMIN "smin")
(UNSPEC_COND_UMIN "umin")
(UNSPEC_COND_AND "and")
(UNSPEC_COND_ORR "ior")
(UNSPEC_COND_EOR "xor")])
(define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax")
(UNSPEC_UMINV "umin") (UNSPEC_UMINV "umin")
...@@ -1696,3 +1723,16 @@ ...@@ -1696,3 +1723,16 @@
(UNSPEC_COND_LS "vsd") (UNSPEC_COND_LS "vsd")
(UNSPEC_COND_HS "vsd") (UNSPEC_COND_HS "vsd")
(UNSPEC_COND_HI "vsd")]) (UNSPEC_COND_HI "vsd")])
(define_int_attr sve_int_op [(UNSPEC_COND_ADD "add")
(UNSPEC_COND_SUB "sub")
(UNSPEC_COND_SMAX "smax")
(UNSPEC_COND_UMAX "umax")
(UNSPEC_COND_SMIN "smin")
(UNSPEC_COND_UMIN "umin")
(UNSPEC_COND_AND "and")
(UNSPEC_COND_ORR "orr")
(UNSPEC_COND_EOR "eor")])
(define_int_attr sve_fp_op [(UNSPEC_COND_ADD "fadd")
(UNSPEC_COND_SUB "fsub")])
...@@ -6248,6 +6248,42 @@ move operand 2 or (operands 2 + operand 3) into operand 0 according to the ...@@ -6248,6 +6248,42 @@ move operand 2 or (operands 2 + operand 3) into operand 0 according to the
comparison in operand 1. If the comparison is false, operand 2 is moved into comparison in operand 1. If the comparison is false, operand 2 is moved into
operand 0, otherwise (operand 2 + operand 3) is moved. operand 0, otherwise (operand 2 + operand 3) is moved.
@cindex @code{cond_add@var{mode}} instruction pattern
@cindex @code{cond_sub@var{mode}} instruction pattern
@cindex @code{cond_and@var{mode}} instruction pattern
@cindex @code{cond_ior@var{mode}} instruction pattern
@cindex @code{cond_xor@var{mode}} instruction pattern
@cindex @code{cond_smin@var{mode}} instruction pattern
@cindex @code{cond_smax@var{mode}} instruction pattern
@cindex @code{cond_umin@var{mode}} instruction pattern
@cindex @code{cond_umax@var{mode}} instruction pattern
@item @samp{cond_add@var{mode}}
@itemx @samp{cond_sub@var{mode}}
@itemx @samp{cond_and@var{mode}}
@itemx @samp{cond_ior@var{mode}}
@itemx @samp{cond_xor@var{mode}}
@itemx @samp{cond_smin@var{mode}}
@itemx @samp{cond_smax@var{mode}}
@itemx @samp{cond_umin@var{mode}}
@itemx @samp{cond_umax@var{mode}}
Perform an elementwise operation on vector operands 2 and 3,
under the control of the vector mask in operand 1, and store the result
in operand 0. This is equivalent to:
@smallexample
for (i = 0; i < GET_MODE_NUNITS (@var{n}); i++)
op0[i] = op1[i] ? op2[i] @var{op} op3[i] : op2[i];
@end smallexample
where, for example, @var{op} is @code{+} for @samp{cond_add@var{mode}}.
When defined for floating-point modes, the contents of @samp{op3[i]}
are not interpreted if @var{op1[i]} is false, just like they would not
be in a normal C @samp{?:} condition.
Operands 0, 2 and 3 all have mode @var{m}, while operand 1 has the mode
returned by @code{TARGET_VECTORIZE_GET_MASK_MODE}.
@cindex @code{neg@var{mode}cc} instruction pattern @cindex @code{neg@var{mode}cc} instruction pattern
@item @samp{neg@var{mode}cc} @item @samp{neg@var{mode}cc}
Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally
......
...@@ -88,6 +88,7 @@ init_internal_fns () ...@@ -88,6 +88,7 @@ init_internal_fns ()
#define mask_store_lanes_direct { 0, 0, false } #define mask_store_lanes_direct { 0, 0, false }
#define unary_direct { 0, 0, true } #define unary_direct { 0, 0, true }
#define binary_direct { 0, 0, true } #define binary_direct { 0, 0, true }
#define cond_binary_direct { 1, 1, true }
#define while_direct { 0, 2, false } #define while_direct { 0, 2, false }
const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
...@@ -2855,6 +2856,9 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab) ...@@ -2855,6 +2856,9 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
#define expand_binary_optab_fn(FN, STMT, OPTAB) \ #define expand_binary_optab_fn(FN, STMT, OPTAB) \
expand_direct_optab_fn (FN, STMT, OPTAB, 2) expand_direct_optab_fn (FN, STMT, OPTAB, 2)
#define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \
expand_direct_optab_fn (FN, STMT, OPTAB, 3)
/* RETURN_TYPE and ARGS are a return type and argument list that are /* RETURN_TYPE and ARGS are a return type and argument list that are
in principle compatible with FN (which satisfies direct_internal_fn_p). in principle compatible with FN (which satisfies direct_internal_fn_p).
Return the types that should be used to determine whether the Return the types that should be used to determine whether the
...@@ -2928,6 +2932,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, ...@@ -2928,6 +2932,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
#define direct_unary_optab_supported_p direct_optab_supported_p #define direct_unary_optab_supported_p direct_optab_supported_p
#define direct_binary_optab_supported_p direct_optab_supported_p #define direct_binary_optab_supported_p direct_optab_supported_p
#define direct_cond_binary_optab_supported_p direct_optab_supported_p
#define direct_mask_load_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p
#define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
#define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
...@@ -3049,6 +3054,37 @@ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = { ...@@ -3049,6 +3054,37 @@ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = {
0 0
}; };
/* Return a function that performs the conditional form of CODE, i.e.:
LHS = RHS1 ? RHS2 CODE RHS3 : RHS2
(operating elementwise if the operands are vectors). Return IFN_LAST
if no such function exists. */
internal_fn
get_conditional_internal_fn (tree_code code)
{
switch (code)
{
case PLUS_EXPR:
return IFN_COND_ADD;
case MINUS_EXPR:
return IFN_COND_SUB;
case MIN_EXPR:
return IFN_COND_MIN;
case MAX_EXPR:
return IFN_COND_MAX;
case BIT_AND_EXPR:
return IFN_COND_AND;
case BIT_IOR_EXPR:
return IFN_COND_IOR;
case BIT_XOR_EXPR:
return IFN_COND_XOR;
default:
return IFN_LAST;
}
}
/* Expand STMT as though it were a call to internal function FN. */ /* Expand STMT as though it were a call to internal function FN. */
void void
......
...@@ -53,6 +53,11 @@ along with GCC; see the file COPYING3. If not see ...@@ -53,6 +53,11 @@ along with GCC; see the file COPYING3. If not see
- store_lanes: currently just vec_store_lanes - store_lanes: currently just vec_store_lanes
- mask_store_lanes: currently just vec_mask_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes
- unary: a normal unary optab, such as vec_reverse_<mode>
- binary: a normal binary optab, such as vec_interleave_lo_<mode>
- cond_binary: a conditional binary optab, such as add<mode>cc
DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that
maps to one of two optabs, depending on the signedness of an input. maps to one of two optabs, depending on the signedness of an input.
SIGNED_OPTAB and UNSIGNED_OPTAB are the optabs for signed and SIGNED_OPTAB and UNSIGNED_OPTAB are the optabs for signed and
...@@ -121,6 +126,19 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) ...@@ -121,6 +126,19 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
vec_shl_insert, binary) vec_shl_insert, binary)
DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary)
DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary)
DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first,
cond_smin, cond_umin, cond_binary)
DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MAX, ECF_CONST, first,
cond_smax, cond_umax, cond_binary)
DEF_INTERNAL_OPTAB_FN (COND_AND, ECF_CONST | ECF_NOTHROW,
cond_and, cond_binary)
DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CONST | ECF_NOTHROW,
cond_ior, cond_binary)
DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW,
cond_xor, cond_binary)
DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary) DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW, DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW,
......
...@@ -190,6 +190,8 @@ direct_internal_fn_supported_p (internal_fn fn, tree type0, tree type1, ...@@ -190,6 +190,8 @@ direct_internal_fn_supported_p (internal_fn fn, tree type0, tree type1,
extern bool set_edom_supported_p (void); extern bool set_edom_supported_p (void);
extern internal_fn get_conditional_internal_fn (tree_code);
extern void expand_internal_call (gcall *); extern void expand_internal_call (gcall *);
extern void expand_internal_call (internal_fn, gcall *); extern void expand_internal_call (internal_fn, gcall *);
extern void expand_PHI (internal_fn, gcall *); extern void expand_PHI (internal_fn, gcall *);
......
...@@ -220,6 +220,15 @@ OPTAB_D (addcc_optab, "add$acc") ...@@ -220,6 +220,15 @@ OPTAB_D (addcc_optab, "add$acc")
OPTAB_D (negcc_optab, "neg$acc") OPTAB_D (negcc_optab, "neg$acc")
OPTAB_D (notcc_optab, "not$acc") OPTAB_D (notcc_optab, "not$acc")
OPTAB_D (movcc_optab, "mov$acc") OPTAB_D (movcc_optab, "mov$acc")
OPTAB_D (cond_add_optab, "cond_add$a")
OPTAB_D (cond_sub_optab, "cond_sub$a")
OPTAB_D (cond_and_optab, "cond_and$a")
OPTAB_D (cond_ior_optab, "cond_ior$a")
OPTAB_D (cond_xor_optab, "cond_xor$a")
OPTAB_D (cond_smin_optab, "cond_smin$a")
OPTAB_D (cond_smax_optab, "cond_smax$a")
OPTAB_D (cond_umin_optab, "cond_umin$a")
OPTAB_D (cond_umax_optab, "cond_umax$a")
OPTAB_D (cmov_optab, "cmov$a6") OPTAB_D (cmov_optab, "cmov$a6")
OPTAB_D (cstore_optab, "cstore$a4") OPTAB_D (cstore_optab, "cstore$a4")
OPTAB_D (ctrap_optab, "ctrap$a4") OPTAB_D (ctrap_optab, "ctrap$a4")
......
...@@ -2,6 +2,19 @@ ...@@ -2,6 +2,19 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors.
* gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations
to be predicated.
* gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop.
* gcc.target/aarch64/sve/slp_7.c: Likewise.
* gcc.target/aarch64/sve/reduc_5.c: New test.
* gcc.target/aarch64/sve/slp_13.c: Likewise.
* gcc.target/aarch64/sve/slp_13_run.c: Likewise.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization. * gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization.
* gcc.dg/tree-ssa/peel1.c: Likewise. * gcc.dg/tree-ssa/peel1.c: Likewise.
* gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for * gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for
......
...@@ -16,6 +16,4 @@ foo (double *x, int n) ...@@ -16,6 +16,4 @@ foo (double *x, int n)
return p; return p;
} }
/* Until fully-masked loops are supported, we always need an epilog /* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
loop for variable-length vectors. */
/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" { xfail vect_variable_length } } } */
...@@ -105,10 +105,10 @@ reduc_##NAME##_##TYPE (TYPE *a, int n) \ ...@@ -105,10 +105,10 @@ reduc_##NAME##_##TYPE (TYPE *a, int n) \
TEST_BITWISE (DEF_REDUC_BITWISE) TEST_BITWISE (DEF_REDUC_BITWISE)
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */
/* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tsmin\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */
...@@ -130,9 +130,9 @@ TEST_BITWISE (DEF_REDUC_BITWISE) ...@@ -130,9 +130,9 @@ TEST_BITWISE (DEF_REDUC_BITWISE)
/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ /* { dg-final { scan-assembler-times {\tumax\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfmaxnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
...@@ -142,11 +142,20 @@ TEST_BITWISE (DEF_REDUC_BITWISE) ...@@ -142,11 +142,20 @@ TEST_BITWISE (DEF_REDUC_BITWISE)
/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ /* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, z[0-9]+\.b\n} 2 } } */
/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */
/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ /* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, z[0-9]+\.b\n} 2 } } */
/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */
/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ /* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, z[0-9]+\.b\n} 2 } } */
/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */
/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
...@@ -180,17 +189,17 @@ TEST_BITWISE (DEF_REDUC_BITWISE) ...@@ -180,17 +189,17 @@ TEST_BITWISE (DEF_REDUC_BITWISE)
/* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ /* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ /* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ /* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ /* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ /* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ /* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ /* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ /* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ /* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ /* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ /* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ /* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
#include <stdint.h>
#define REDUC(TYPE) \
TYPE reduc_##TYPE (TYPE *x, int count) \
{ \
TYPE sum = 0; \
for (int i = 0; i < count; ++i) \
sum -= x[i]; \
return sum; \
}
REDUC (int8_t)
REDUC (uint8_t)
REDUC (int16_t)
REDUC (uint16_t)
REDUC (int32_t)
REDUC (uint32_t)
REDUC (int64_t)
REDUC (uint64_t)
REDUC (float)
REDUC (double)
/* XFAILed until we support sub-int reductions for signed types. */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.b, p[0-7]/m} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.h, p[0-7]/m} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.b, p[0-7]/m} 1 } } */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.h, p[0-7]/m} 1 } } */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.s, p[0-7]/m} 2 } } */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.d, p[0-7]/m} 2 } } */
/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m} 1 } } */
/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m} 1 } } */
/* XFAILed until we support sub-int reductions for signed types. */
/* { dg-final { scan-assembler-times {\tsub\t} 8 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tfsub\t} 2 } } */
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
#include <stdint.h>
#define VEC_PERM(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
vec_slp_##TYPE (TYPE *restrict a, int n) \
{ \
TYPE res = 0; \
for (int i = 0; i < n; ++i) \
{ \
res += a[i * 2] * 3; \
res += a[i * 2 + 1] * 5; \
} \
return res; \
}
#define TEST_ALL(T) \
T (int8_t) \
T (uint8_t) \
T (int16_t) \
T (uint16_t) \
T (int32_t) \
T (uint32_t) \
T (int64_t) \
T (uint64_t)
TEST_ALL (VEC_PERM)
/* ??? We don't treat the int8_t and int16_t loops as reductions. */
/* ??? We don't treat the uint loops as SLP. */
/* The loop should be fully-masked. */
/* { dg-final { scan-assembler-times {\tld1b\t} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tld1h\t} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tld1w\t} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tld1w\t} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\t} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tld1d\t} 1 } } */
/* { dg-final { scan-assembler-not {\tldr} { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b} 4 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h} 4 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 4 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 4 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
/* { dg-final { scan-assembler-not {\tuqdec} } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "slp_13.c"
#define N1 (103 * 2)
#define N2 (111 * 2)
#define HARNESS(TYPE) \
{ \
TYPE a[N2]; \
TYPE expected = 0; \
for (unsigned int i = 0; i < N2; ++i) \
{ \
a[i] = i * 2 + i % 5; \
if (i < N1) \
expected += a[i] * (i & 1 ? 5 : 3); \
asm volatile (""); \
} \
if (vec_slp_##TYPE (a, N1 / 2) != expected) \
__builtin_abort (); \
}
int __attribute__ ((optimize (1)))
main (void)
{
TEST_ALL (HARNESS)
}
...@@ -56,3 +56,12 @@ TEST_ALL (VEC_PERM) ...@@ -56,3 +56,12 @@ TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ /* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */
/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ /* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */
/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ /* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */
/* Should be 4 and 6 respectively, if we used reductions for int8_t and
int16_t. */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b} 2 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h} 4 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
/* { dg-final { scan-assembler-not {\tuqdec} } } */
...@@ -64,3 +64,12 @@ TEST_ALL (VEC_PERM) ...@@ -64,3 +64,12 @@ TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h} 4 } } */ /* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h} 4 } } */
/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s} 4 } } */ /* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s} 4 } } */
/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 4 } } */ /* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 4 } } */
/* Should be 4 and 6 respectively, if we used reductions for int8_t and
int16_t. */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b} 2 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h} 4 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
/* { dg-final { scan-assembler-not {\tuqdec} } } */
...@@ -6893,19 +6893,42 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -6893,19 +6893,42 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
return false; return false;
} }
if (slp_node)
vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
else
vec_num = 1;
internal_fn cond_fn = get_conditional_internal_fn (code);
vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
if (!vec_stmt) /* transformation not required. */ if (!vec_stmt) /* transformation not required. */
{ {
if (LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"can't use a fully-masked loop due to "
"reduction operation.\n");
LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
}
if (first_p) if (first_p)
vect_model_reduction_cost (stmt_info, reduc_fn, ncopies); vect_model_reduction_cost (stmt_info, reduc_fn, ncopies);
if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
{
if (cond_fn == IFN_LAST
|| !direct_internal_fn_supported_p (cond_fn, vectype_in,
OPTIMIZE_FOR_SPEED))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"can't use a fully-masked loop because no"
" conditional operation is available.\n");
LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
}
else if (reduc_index == -1)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"can't use a fully-masked loop for chained"
" reductions.\n");
LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
}
else
vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
vectype_in);
}
STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type; STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
return true; return true;
} }
...@@ -6919,16 +6942,15 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -6919,16 +6942,15 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
if (code == COND_EXPR) if (code == COND_EXPR)
gcc_assert (ncopies == 1); gcc_assert (ncopies == 1);
bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
/* Create the destination vector */ /* Create the destination vector */
vec_dest = vect_create_destination_var (scalar_dest, vectype_out); vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
prev_stmt_info = NULL; prev_stmt_info = NULL;
prev_phi_info = NULL; prev_phi_info = NULL;
if (slp_node) if (!slp_node)
vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
else
{ {
vec_num = 1;
vec_oprnds0.create (1); vec_oprnds0.create (1);
vec_oprnds1.create (1); vec_oprnds1.create (1);
if (op_type == ternary_op) if (op_type == ternary_op)
...@@ -7002,19 +7024,19 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -7002,19 +7024,19 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
gcc_assert (reduc_index != -1 || ! single_defuse_cycle); gcc_assert (reduc_index != -1 || ! single_defuse_cycle);
if (single_defuse_cycle && reduc_index == 0) if (single_defuse_cycle && reduc_index == 0)
vec_oprnds0[0] = gimple_assign_lhs (new_stmt); vec_oprnds0[0] = gimple_get_lhs (new_stmt);
else else
vec_oprnds0[0] vec_oprnds0[0]
= vect_get_vec_def_for_stmt_copy (dts[0], vec_oprnds0[0]); = vect_get_vec_def_for_stmt_copy (dts[0], vec_oprnds0[0]);
if (single_defuse_cycle && reduc_index == 1) if (single_defuse_cycle && reduc_index == 1)
vec_oprnds1[0] = gimple_assign_lhs (new_stmt); vec_oprnds1[0] = gimple_get_lhs (new_stmt);
else else
vec_oprnds1[0] vec_oprnds1[0]
= vect_get_vec_def_for_stmt_copy (dts[1], vec_oprnds1[0]); = vect_get_vec_def_for_stmt_copy (dts[1], vec_oprnds1[0]);
if (op_type == ternary_op) if (op_type == ternary_op)
{ {
if (single_defuse_cycle && reduc_index == 2) if (single_defuse_cycle && reduc_index == 2)
vec_oprnds2[0] = gimple_assign_lhs (new_stmt); vec_oprnds2[0] = gimple_get_lhs (new_stmt);
else else
vec_oprnds2[0] vec_oprnds2[0]
= vect_get_vec_def_for_stmt_copy (dts[2], vec_oprnds2[0]); = vect_get_vec_def_for_stmt_copy (dts[2], vec_oprnds2[0]);
...@@ -7025,13 +7047,33 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -7025,13 +7047,33 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
FOR_EACH_VEC_ELT (vec_oprnds0, i, def0) FOR_EACH_VEC_ELT (vec_oprnds0, i, def0)
{ {
tree vop[3] = { def0, vec_oprnds1[i], NULL_TREE }; tree vop[3] = { def0, vec_oprnds1[i], NULL_TREE };
if (op_type == ternary_op) if (masked_loop_p)
vop[2] = vec_oprnds2[i]; {
/* Make sure that the reduction accumulator is vop[0]. */
if (reduc_index == 1)
{
gcc_assert (commutative_tree_code (code));
std::swap (vop[0], vop[1]);
}
tree mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
vectype_in, i * ncopies + j);
gcall *call = gimple_build_call_internal (cond_fn, 3, mask,
vop[0], vop[1]);
new_temp = make_ssa_name (vec_dest, call);
gimple_call_set_lhs (call, new_temp);
gimple_call_set_nothrow (call, true);
new_stmt = call;
}
else
{
if (op_type == ternary_op)
vop[2] = vec_oprnds2[i];
new_temp = make_ssa_name (vec_dest, new_stmt); new_temp = make_ssa_name (vec_dest, new_stmt);
new_stmt = gimple_build_assign (new_temp, code, new_stmt = gimple_build_assign (new_temp, code,
vop[0], vop[1], vop[2]); vop[0], vop[1], vop[2]);
vect_finish_stmt_generation (stmt, new_stmt, gsi); }
vect_finish_stmt_generation (stmt, new_stmt, gsi);
if (slp_node) if (slp_node)
{ {
...@@ -7056,7 +7098,7 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -7056,7 +7098,7 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
/* Finalize the reduction-phi (set its arguments) and create the /* Finalize the reduction-phi (set its arguments) and create the
epilog reduction code. */ epilog reduction code. */
if ((!single_defuse_cycle || code == COND_EXPR) && !slp_node) if ((!single_defuse_cycle || code == COND_EXPR) && !slp_node)
vect_defs[0] = gimple_assign_lhs (*vec_stmt); vect_defs[0] = gimple_get_lhs (*vec_stmt);
vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt, vect_create_epilog_for_reduction (vect_defs, stmt, reduc_def_stmt,
epilog_copies, reduc_fn, phis, epilog_copies, reduc_fn, phis,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment