Commit c2700f74 by Richard Sandiford Committed by Richard Sandiford

Allow the number of iterations to be smaller than VF

Fully-masked loops can be profitable even if the iteration
count is smaller than the vectorisation factor.  In this case
we're effectively doing a complete unroll followed by SLP.

The documentation for min-vect-loop-bound says that the
default value was 0, but actually the default and minimum
were 1.  We need it to be 0 for this case since the parameter
counts a whole number of vector iterations.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/sourcebuild.texi (vect_fully_masked): Document.
	* params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and
	default value to 0.
	* tree-vect-loop.c (vect_analyze_loop_costing): New function,
	split out from...
	(vect_analyze_loop_2): ...here. Don't check the vectorization
	factor against the number of loop iterations if the loop is
	fully-masked.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_fully_masked):
	New proc.
	* gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if
	vect_fully_masked.
	* gcc.target/aarch64/sve/loop_add_4.c: New test.
	* gcc.target/aarch64/sve/loop_add_4_run.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_5.c: Likewise.
	* gcc.target/aarch64/sve/loop_add_5_run.c: Likewise.
	* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
	* gcc.target/aarch64/sve/miniloop_2.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256629
parent 8277ddf9
...@@ -2,6 +2,19 @@ ...@@ -2,6 +2,19 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* doc/sourcebuild.texi (vect_fully_masked): Document.
* params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and
default value to 0.
* tree-vect-loop.c (vect_analyze_loop_costing): New function,
split out from...
(vect_analyze_loop_2): ...here. Don't check the vectorization
factor against the number of loop iterations if the loop is
fully-masked.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* tree-ssa-loop-ivopts.c (USE_ADDRESS): Split into... * tree-ssa-loop-ivopts.c (USE_ADDRESS): Split into...
(USE_REF_ADDRESS, USE_PTR_ADDRESS): ...these new use types. (USE_REF_ADDRESS, USE_PTR_ADDRESS): ...these new use types.
(dump_groups): Update accordingly. (dump_groups): Update accordingly.
......
...@@ -1414,6 +1414,10 @@ Target supports hardware vectors of @code{long}. ...@@ -1414,6 +1414,10 @@ Target supports hardware vectors of @code{long}.
@item vect_long_long @item vect_long_long
Target supports hardware vectors of @code{long long}. Target supports hardware vectors of @code{long long}.
@item vect_fully_masked
Target supports fully-masked (also known as fully-predicated) loops,
so that vector loops can handle partial as well as full vectors.
@item vect_masked_store @item vect_masked_store
Target supports vector masked stores. Target supports vector masked stores.
......
...@@ -139,7 +139,7 @@ DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS, ...@@ -139,7 +139,7 @@ DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS,
DEFPARAM (PARAM_MIN_VECT_LOOP_BOUND, DEFPARAM (PARAM_MIN_VECT_LOOP_BOUND,
"min-vect-loop-bound", "min-vect-loop-bound",
"If -ftree-vectorize is used, the minimal loop bound of a loop to be considered for vectorization.", "If -ftree-vectorize is used, the minimal loop bound of a loop to be considered for vectorization.",
1, 1, 0) 0, 0, 0)
/* The maximum number of instructions to consider when looking for an /* The maximum number of instructions to consider when looking for an
instruction to fill a delay slot. If more than this arbitrary instruction to fill a delay slot. If more than this arbitrary
......
...@@ -2,6 +2,21 @@ ...@@ -2,6 +2,21 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* lib/target-supports.exp (check_effective_target_vect_fully_masked):
New proc.
* gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if
vect_fully_masked.
* gcc.target/aarch64/sve/loop_add_4.c: New test.
* gcc.target/aarch64/sve/loop_add_4_run.c: Likewise.
* gcc.target/aarch64/sve/loop_add_5.c: Likewise.
* gcc.target/aarch64/sve/loop_add_5_run.c: Likewise.
* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
* gcc.target/aarch64/sve/miniloop_2.c: Likewise.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* gcc.dg/tree-ssa/scev-9.c: Expected REFERENCE ADDRESS * gcc.dg/tree-ssa/scev-9.c: Expected REFERENCE ADDRESS
instead of just ADDRESS. instead of just ADDRESS.
* gcc.dg/tree-ssa/scev-10.c: Likewise. * gcc.dg/tree-ssa/scev-10.c: Likewise.
......
...@@ -141,6 +141,8 @@ int main (void) ...@@ -141,6 +141,8 @@ int main (void)
return 0; return 0;
} }
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { ! vect_fully_masked } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target vect_fully_masked } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target { ! vect_fully_masked } } } }*/
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target vect_fully_masked } } } */
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
#include <stdint.h>
#define LOOP(TYPE, NAME, STEP) \
__attribute__((noinline, noclone)) \
void \
test_##TYPE##_##NAME (TYPE *dst, TYPE base, int count) \
{ \
for (int i = 0; i < count; ++i, base += STEP) \
dst[i] += base; \
}
#define TEST_TYPE(T, TYPE) \
T (TYPE, m17, -17) \
T (TYPE, m16, -16) \
T (TYPE, m15, -15) \
T (TYPE, m1, -1) \
T (TYPE, 1, 1) \
T (TYPE, 15, 15) \
T (TYPE, 16, 16) \
T (TYPE, 17, 17)
#define TEST_ALL(T) \
TEST_TYPE (T, int8_t) \
TEST_TYPE (T, int16_t) \
TEST_TYPE (T, int32_t) \
TEST_TYPE (T, int64_t)
TEST_ALL (LOOP)
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, w[0-9]+\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]+/z, \[x[0-9]+, x[0-9]+\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7]+, \[x[0-9]+, x[0-9]+\]} 8 } } */
/* { dg-final { scan-assembler-times {\tincb\tx[0-9]+\n} 8 } } */
/* { dg-final { scan-assembler-not {\tdecb\tz[0-9]+\.b} } } */
/* We don't need to increment the vector IV for steps -16 and 16, since the
increment is always a multiple of 256. */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 14 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, w[0-9]+\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */
/* { dg-final { scan-assembler-times {\tincb\tx[0-9]+\n} 8 } } */
/* { dg-final { scan-assembler-times {\tdech\tz[0-9]+\.h, all, mul #16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tdech\tz[0-9]+\.h, all, mul #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tdech\tz[0-9]+\.h\n} 1 } } */
/* { dg-final { scan-assembler-times {\tinch\tz[0-9]+\.h\n} 1 } } */
/* { dg-final { scan-assembler-times {\tinch\tz[0-9]+\.h, all, mul #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tinch\tz[0-9]+\.h, all, mul #16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 10 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, w[0-9]+\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */
/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 8 } } */
/* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tincw\tz[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tincw\tz[0-9]+\.s, all, mul #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tincw\tz[0-9]+\.s, all, mul #16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 10 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, x[0-9]+\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */
/* { dg-final { scan-assembler-times {\tincd\tx[0-9]+\n} 8 } } */
/* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d\n} 1 } } */
/* { dg-final { scan-assembler-times {\tincd\tz[0-9]+\.d\n} 1 } } */
/* { dg-final { scan-assembler-times {\tincd\tz[0-9]+\.d, all, mul #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tincd\tz[0-9]+\.d, all, mul #16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 10 } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "loop_add_4.c"
#define N 131
#define BASE 41
#define TEST_LOOP(TYPE, NAME, STEP) \
{ \
TYPE a[N]; \
for (int i = 0; i < N; ++i) \
{ \
a[i] = i * i + i % 5; \
asm volatile ("" ::: "memory"); \
} \
test_##TYPE##_##NAME (a, BASE, N); \
for (int i = 0; i < N; ++i) \
{ \
TYPE expected = i * i + i % 5 + BASE + i * STEP; \
if (a[i] != expected) \
__builtin_abort (); \
} \
}
int __attribute__ ((optimize (1)))
main (void)
{
TEST_ALL (TEST_LOOP)
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256" } */
#include "loop_add_4.c"
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-16\n} 1 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-15\n} 1 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #15\n} 1 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, w[0-9]+\n} 3 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]+/z, \[x[0-9]+, x[0-9]+\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7]+, \[x[0-9]+, x[0-9]+\]} 8 } } */
/* The induction vector is invariant for steps of -16 and 16. */
/* { dg-final { scan-assembler-not {\tsub\tz[0-9]+\.b, z[0-9]+\.b, #} } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, #} 6 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 8 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-16\n} 1 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-15\n} 1 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, w[0-9]+\n} 3 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */
/* The (-)17 * 16 is out of range. */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.h, z[0-9]+\.h, #} 2 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, #} 4 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 10 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, w[0-9]+\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.s, z[0-9]+\.s, #} 4 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, #} 4 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 8 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-16\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #15\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, x[0-9]+\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */
/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.d, z[0-9]+\.d, #} 4 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, #} 4 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize" } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256" { target aarch64_sve256_hw } } */
#include "loop_add_4_run.c"
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
void loop (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c,
int * __restrict__ d, int * __restrict__ e, int * __restrict__ f,
int * __restrict__ g, int * __restrict__ h)
{
int i = 0;
for (i = 0; i < 3; i++)
{
a[i] += i;
b[i] += i;
c[i] += i;
d[i] += i;
e[i] += i;
f[i] += a[i] + 7;
g[i] += b[i] - 3;
h[i] += c[i] + 3;
}
}
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, } 8 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, } 8 } } */
/* { dg-do assemble { target aarch64_asm_sve_ok } } */
/* { dg-options "-O2 -ftree-vectorize --save-temps -msve-vector-bits=256" } */
#include "miniloop_1.c"
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, } 8 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, } 8 } } */
...@@ -6494,6 +6494,12 @@ proc check_effective_target_vect_natural_alignment { } { ...@@ -6494,6 +6494,12 @@ proc check_effective_target_vect_natural_alignment { } {
return $et_vect_natural_alignment return $et_vect_natural_alignment
} }
# Return true if fully-masked loops are supported.
proc check_effective_target_vect_fully_masked { } {
return [check_effective_target_aarch64_sve]
}
# Return 1 if the target doesn't prefer any alignment beyond element # Return 1 if the target doesn't prefer any alignment beyond element
# alignment during vectorization. # alignment during vectorization.
......
...@@ -1896,6 +1896,101 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo) ...@@ -1896,6 +1896,101 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
return true; return true;
} }
/* Analyze the cost of the loop described by LOOP_VINFO. Decide if it
is worthwhile to vectorize. Return 1 if definitely yes, 0 if
definitely no, or -1 if it's worth retrying. */
static int
vect_analyze_loop_costing (loop_vec_info loop_vinfo)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
/* Only fully-masked loops can have iteration counts less than the
vectorization factor. */
if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
{
HOST_WIDE_INT max_niter;
if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
max_niter = LOOP_VINFO_INT_NITERS (loop_vinfo);
else
max_niter = max_stmt_executions_int (loop);
if (max_niter != -1
&& (unsigned HOST_WIDE_INT) max_niter < assumed_vf)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: iteration count smaller than "
"vectorization factor.\n");
return 0;
}
}
int min_profitable_iters, min_profitable_estimate;
vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters,
&min_profitable_estimate);
if (min_profitable_iters < 0)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: vectorization not profitable.\n");
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: vector version will never be "
"profitable.\n");
return -1;
}
int min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
* assumed_vf);
/* Use the cost model only if it is more conservative than user specified
threshold. */
unsigned int th = (unsigned) MAX (min_scalar_loop_bound,
min_profitable_iters);
LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
&& LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: vectorization not profitable.\n");
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"not vectorized: iteration count smaller than user "
"specified loop bound parameter or minimum profitable "
"iterations (whichever is more conservative).\n");
return 0;
}
HOST_WIDE_INT estimated_niter = estimated_stmt_executions_int (loop);
if (estimated_niter == -1)
estimated_niter = likely_max_stmt_executions_int (loop);
if (estimated_niter != -1
&& ((unsigned HOST_WIDE_INT) estimated_niter
< MAX (th, (unsigned) min_profitable_estimate)))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: estimated iteration count too "
"small.\n");
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"not vectorized: estimated iteration count smaller "
"than specified loop bound parameter or minimum "
"profitable iterations (whichever is more "
"conservative).\n");
return -1;
}
return 1;
}
/* Function vect_analyze_loop_2. /* Function vect_analyze_loop_2.
...@@ -1906,6 +2001,7 @@ static bool ...@@ -1906,6 +2001,7 @@ static bool
vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal) vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal)
{ {
bool ok; bool ok;
int res;
unsigned int max_vf = MAX_VECTORIZATION_FACTOR; unsigned int max_vf = MAX_VECTORIZATION_FACTOR;
poly_uint64 min_vf = 2; poly_uint64 min_vf = 2;
unsigned int n_stmts = 0; unsigned int n_stmts = 0;
...@@ -2063,9 +2159,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal) ...@@ -2063,9 +2159,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal)
vect_compute_single_scalar_iteration_cost (loop_vinfo); vect_compute_single_scalar_iteration_cost (loop_vinfo);
poly_uint64 saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); poly_uint64 saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
HOST_WIDE_INT estimated_niter;
unsigned th; unsigned th;
int min_scalar_loop_bound;
/* Check the SLP opportunities in the loop, analyze and build SLP trees. */ /* Check the SLP opportunities in the loop, analyze and build SLP trees. */
ok = vect_analyze_slp (loop_vinfo, n_stmts); ok = vect_analyze_slp (loop_vinfo, n_stmts);
...@@ -2095,7 +2189,6 @@ start_over: ...@@ -2095,7 +2189,6 @@ start_over:
/* Now the vectorization factor is final. */ /* Now the vectorization factor is final. */
poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
gcc_assert (known_ne (vectorization_factor, 0U)); gcc_assert (known_ne (vectorization_factor, 0U));
unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ()) if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ())
{ {
...@@ -2108,17 +2201,6 @@ start_over: ...@@ -2108,17 +2201,6 @@ start_over:
HOST_WIDE_INT max_niter HOST_WIDE_INT max_niter
= likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)); = likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
&& (LOOP_VINFO_INT_NITERS (loop_vinfo) < assumed_vf))
|| (max_niter != -1
&& (unsigned HOST_WIDE_INT) max_niter < assumed_vf))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: iteration count smaller than "
"vectorization factor.\n");
return false;
}
/* Analyze the alignment of the data-refs in the loop. /* Analyze the alignment of the data-refs in the loop.
Fail if a data reference is found that cannot be vectorized. */ Fail if a data reference is found that cannot be vectorized. */
...@@ -2232,65 +2314,16 @@ start_over: ...@@ -2232,65 +2314,16 @@ start_over:
} }
} }
/* Analyze cost. Decide if worth while to vectorize. */ /* Check the costings of the loop make vectorizing worthwhile. */
int min_profitable_estimate, min_profitable_iters; res = vect_analyze_loop_costing (loop_vinfo);
vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters, if (res < 0)
&min_profitable_estimate); goto again;
if (!res)
if (min_profitable_iters < 0)
{ {
if (dump_enabled_p ()) if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: vectorization not profitable.\n"); "Loop costings not worthwhile.\n");
if (dump_enabled_p ()) return false;
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: vector version will never be "
"profitable.\n");
goto again;
}
min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
* assumed_vf);
/* Use the cost model only if it is more conservative than user specified
threshold. */
th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters);
LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
&& LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: vectorization not profitable.\n");
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"not vectorized: iteration count smaller than user "
"specified loop bound parameter or minimum profitable "
"iterations (whichever is more conservative).\n");
goto again;
}
estimated_niter
= estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
if (estimated_niter == -1)
estimated_niter = max_niter;
if (estimated_niter != -1
&& ((unsigned HOST_WIDE_INT) estimated_niter
< MAX (th, (unsigned) min_profitable_estimate)))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: estimated iteration count too "
"small.\n");
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
"not vectorized: estimated iteration count smaller "
"than specified loop bound parameter or minimum "
"profitable iterations (whichever is more "
"conservative).\n");
goto again;
} }
/* Decide whether we need to create an epilogue loop to handle /* Decide whether we need to create an epilogue loop to handle
...@@ -3881,7 +3914,6 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, ...@@ -3881,7 +3914,6 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
* assumed_vf * assumed_vf
- vec_inside_cost * peel_iters_prologue - vec_inside_cost * peel_iters_prologue
- vec_inside_cost * peel_iters_epilogue); - vec_inside_cost * peel_iters_epilogue);
if (min_profitable_iters <= 0) if (min_profitable_iters <= 0)
min_profitable_iters = 0; min_profitable_iters = 0;
else else
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment