Commit a57776a1 by Richard Sandiford Committed by Richard Sandiford

Support for aliasing with variable strides

This patch adds runtime alias checks for loops with variable strides,
so that we can vectorise them even without a restrict qualifier.
There are several parts to doing this:

1) For accesses like:

     x[i * n] += 1;

   we need to check whether n (and thus the DR_STEP) is nonzero.
   vect_analyze_data_ref_dependence records values that need to be
   checked in this way, then prune_runtime_alias_test_list records a
   bounds check on DR_STEP being outside the range [0, 0].

2) For accesses like:

     x[i * n] = x[i * n + 1] + 1;

   we simply need to test whether abs (n) >= 2.
   prune_runtime_alias_test_list looks for cases like this and tries
   to guess whether it is better to use this kind of check or a check
   for non-overlapping ranges.  (We could do an OR of the two conditions
   at runtime, but that isn't implemented yet.)

3) Checks for overlapping ranges need to cope with variable strides.
   At present the "length" of each segment in a range check is
   represented as an offset from the base that lies outside the
   touched range, in the same direction as DR_STEP.  The length
   can therefore be negative and is sometimes conservative.

   With variable steps it's easier to reaon about if we split
   this into two:

     seg_len:
       distance travelled from the first iteration of interest
       to the last, e.g. DR_STEP * (VF - 1)

     access_size:
       the number of bytes accessed in each iteration

   with access_size always being a positive constant and seg_len
   possibly being variable.  We can then combine alias checks
   for two accesses that are a constant number of bytes apart by
   adjusting the access size to account for the gap.  This leaves
   the segment length unchanged, which allows the check to be combined
   with further accesses.

   When seg_len is positive, the runtime alias check has the form:

        base_a >= base_b + seg_len_b + access_size_b
     || base_b >= base_a + seg_len_a + access_size_a

   In many accesses the base will be aligned to the access size, which
   allows us to skip the addition:

        base_a > base_b + seg_len_b
     || base_b > base_a + seg_len_a

   A similar saving is possible with "negative" lengths.

   The patch therefore tracks the alignment in addition to seg_len
   and access_size.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (vec_lower_bound): New structure.
	(_loop_vec_info): Add check_nonzero and lower_bounds.
	(LOOP_VINFO_CHECK_NONZERO): New macro.
	(LOOP_VINFO_LOWER_BOUNDS): Likewise.
	(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
	* tree-data-ref.h (dr_with_seg_len): Add access_size and align
	fields.  Make seg_len the distance travelled, not including the
	access size.
	(dr_direction_indicator): Declare.
	(dr_zero_step_indicator): Likewise.
	(dr_known_forward_stride_p): Likewise.
	* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
	tree-ssanames.h.
	(runtime_alias_check_p): Allow runtime alias checks with
	variable strides.
	(operator ==): Compare access_size and align.
	(prune_runtime_alias_test_list): Rework for new distinction between
	the access_size and seg_len.
	(create_intersect_range_checks_index): Likewise.  Cope with polynomial
	segment lengths.
	(get_segment_min_max): New function.
	(create_intersect_range_checks): Use it.
	(dr_step_indicator): New function.
	(dr_direction_indicator): Likewise.
	(dr_zero_step_indicator): Likewise.
	(dr_known_forward_stride_p): Likewise.
	* tree-loop-distribution.c (data_ref_segment_size): Return
	DR_STEP * (niters - 1).
	(compute_alias_check_pairs): Update call to the dr_with_seg_len
	constructor.
	* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
	(vect_preserves_scalar_order_p): New function, split out from...
	(vect_analyze_data_ref_dependence): ...here.  Check for zero steps.
	(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
	(vect_vfa_access_size): New function.
	(vect_vfa_align): Likewise.
	(vect_compile_time_alias): Take access_size_a and access_b arguments.
	(dump_lower_bound): New function.
	(vect_check_lower_bound): Likewise.
	(vect_small_gap_p): Likewise.
	(vectorizable_with_step_bound_p): Likewise.
	(vect_prune_runtime_alias_test_list): Ignore cross-iteration
	depencies if the vectorization factor is 1.  Convert the checks
	for nonzero steps into checks on the bounds of DR_STEP.  Try using
	a bunds check for variable steps if the minimum required step is
	relatively small. Update calls to the dr_with_seg_len
	constructor and to vect_compile_time_alias.
	* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
	function.
	(vect_loop_versioning): Call it.
	* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
	when retrying.
	(vect_estimate_min_profitable_iters): Account for any bounds checks.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
	than SLP vectorization.
	* gcc.dg/vect/vect-alias-check-10.c: New test.
	* gcc.dg/vect/vect-alias-check-11.c: Likewise.
	* gcc.dg/vect/vect-alias-check-12.c: Likewise.
	* gcc.dg/vect/vect-alias-check-8.c: Likewise.
	* gcc.dg/vect/vect-alias-check-9.c: Likewise.
	* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
	* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
	* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256644
parent f307441a
...@@ -2,6 +2,64 @@ ...@@ -2,6 +2,64 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* tree-vectorizer.h (vec_lower_bound): New structure.
(_loop_vec_info): Add check_nonzero and lower_bounds.
(LOOP_VINFO_CHECK_NONZERO): New macro.
(LOOP_VINFO_LOWER_BOUNDS): Likewise.
(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
* tree-data-ref.h (dr_with_seg_len): Add access_size and align
fields. Make seg_len the distance travelled, not including the
access size.
(dr_direction_indicator): Declare.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
tree-ssanames.h.
(runtime_alias_check_p): Allow runtime alias checks with
variable strides.
(operator ==): Compare access_size and align.
(prune_runtime_alias_test_list): Rework for new distinction between
the access_size and seg_len.
(create_intersect_range_checks_index): Likewise. Cope with polynomial
segment lengths.
(get_segment_min_max): New function.
(create_intersect_range_checks): Use it.
(dr_step_indicator): New function.
(dr_direction_indicator): Likewise.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-loop-distribution.c (data_ref_segment_size): Return
DR_STEP * (niters - 1).
(compute_alias_check_pairs): Update call to the dr_with_seg_len
constructor.
* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
(vect_preserves_scalar_order_p): New function, split out from...
(vect_analyze_data_ref_dependence): ...here. Check for zero steps.
(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
(vect_vfa_access_size): New function.
(vect_vfa_align): Likewise.
(vect_compile_time_alias): Take access_size_a and access_b arguments.
(dump_lower_bound): New function.
(vect_check_lower_bound): Likewise.
(vect_small_gap_p): Likewise.
(vectorizable_with_step_bound_p): Likewise.
(vect_prune_runtime_alias_test_list): Ignore cross-iteration
depencies if the vectorization factor is 1. Convert the checks
for nonzero steps into checks on the bounds of DR_STEP. Try using
a bunds check for variable steps if the minimum required step is
relatively small. Update calls to the dr_with_seg_len
constructor and to vect_compile_time_alias.
* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
function.
(vect_loop_versioning): Call it.
* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
when retrying.
(vect_estimate_min_profitable_iters): Account for any bounds checks.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* doc/sourcebuild.texi (vect_scatter_store): Document. * doc/sourcebuild.texi (vect_scatter_store): Document.
* optabs.def (scatter_store_optab, mask_scatter_store_optab): New * optabs.def (scatter_store_optab, mask_scatter_store_optab): New
optabs. optabs.
......
...@@ -2,6 +2,37 @@ ...@@ -2,6 +2,37 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
than SLP vectorization.
* gcc.dg/vect/vect-alias-check-10.c: New test.
* gcc.dg/vect/vect-alias-check-11.c: Likewise.
* gcc.dg/vect/vect-alias-check-12.c: Likewise.
* gcc.dg/vect/vect-alias-check-8.c: Likewise.
* gcc.dg/vect/vect-alias-check-9.c: Likewise.
* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* lib/target-supports.exp (check_effective_target_vect_scatter_store): * lib/target-supports.exp (check_effective_target_vect_scatter_store):
New proc. New proc.
* gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on * gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on
......
...@@ -45,10 +45,6 @@ int main () ...@@ -45,10 +45,6 @@ int main ()
return 0; return 0;
} }
/* Basic blocks of if-converted loops are vectorized from within the loop /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { target vect_element_align } } } */
vectorizer pass. In this case it is really a deficiency in loop /* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target vect_element_align } } } */
vectorization data dependence analysis that causes us to require
basic block vectorization in the first place. */
/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "vect" { target vect_element_align } } } */
/* { dg-do run } */
#define N 87
#define M 6
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE(I) ((I) * 5 / 2)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *a, int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
a[i * step + 0] = a[i * step + 0] + 1; \
a[i * step + 1] = a[i * step + 1] + 2; \
a[i * step + 2] = a[i * step + 2] + 4; \
a[i * step + 3] = a[i * step + 3] + 8; \
} \
} \
void __attribute__((noinline, noclone)) \
ref_##TYPE (TYPE *a, int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
a[i * step + 0] = a[i * step + 0] + 1; \
a[i * step + 1] = a[i * step + 1] + 2; \
a[i * step + 2] = a[i * step + 2] + 4; \
a[i * step + 3] = a[i * step + 3] + 8; \
asm volatile (""); \
} \
}
#define DO_TEST(TYPE) \
for (int j = -M; j <= M; ++j) \
{ \
TYPE a[N * M], b[N * M]; \
for (int i = 0; i < N * M; ++i) \
a[i] = b[i] = TEST_VALUE (i); \
int offset = (j < 0 ? N * M - 4 : 0); \
test_##TYPE (a + offset, j); \
ref_##TYPE (b + offset, j); \
if (__builtin_memcmp (a, b, sizeof (a)) != 0) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}
/* { dg-do run } */
#define N 87
#define M 6
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE1(I) ((I) * 5 / 2)
#define TEST_VALUE2(I) ((I) * 11 / 5)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
} \
} \
\
void __attribute__((noinline, noclone)) \
ref_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
asm volatile (""); \
} \
}
#define DO_TEST(TYPE) \
for (int j = -M; j <= M; ++j) \
{ \
TYPE a1[N * M], a2[N * M], b1[N], b2[N]; \
for (int i = 0; i < N * M; ++i) \
a1[i] = a2[i] = TEST_VALUE1 (i); \
for (int i = 0; i < N; ++i) \
b1[i] = b2[i] = TEST_VALUE2 (i); \
int offset = (j < 0 ? N * M - 4 : 0); \
test_##TYPE (a1 + offset, b1, j); \
ref_##TYPE (a2 + offset, b2, j); \
if (__builtin_memcmp (a1, a2, sizeof (a1)) != 0) \
__builtin_abort (); \
if (__builtin_memcmp (b1, b2, sizeof (b1)) != 0) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* is outside \(-2, 2\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* is outside \(-3, 3\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* is outside \(-4, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]*\) >= 4} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 2[)]* is outside \(-4, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 2[)]* is outside \(-6, 6\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 2[)]* is outside \(-8, 8\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 2[)]* >= 8} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 4[)]* is outside \(-8, 8\)} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 4[)]* is outside \(-12, 12\)} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 4[)]* is outside \(-16, 16\)} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 4[)]* >= 16} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-16, 16\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-24, 24\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-32, 32\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */
/* { dg-do run } */
#define N 87
#define M 7
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE1(I) ((I) * 5 / 2)
#define TEST_VALUE2(I) ((I) * 11 / 5)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
step = step & M; \
for (int i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
} \
} \
\
void __attribute__((noinline, noclone)) \
ref_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
for (unsigned short i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
asm volatile (""); \
} \
}
#define DO_TEST(TYPE) \
for (int j = 0; j <= M; ++j) \
{ \
TYPE a1[N * M], a2[N * M], b1[N], b2[N]; \
for (int i = 0; i < N * M; ++i) \
a1[i] = a2[i] = TEST_VALUE1 (i); \
for (int i = 0; i < N; ++i) \
b1[i] = b2[i] = TEST_VALUE2 (i); \
test_##TYPE (a1, b1, j); \
ref_##TYPE (a2, b2, j); \
if (__builtin_memcmp (a1, a2, sizeof (a1)) != 0) \
__builtin_abort (); \
if (__builtin_memcmp (b1, b2, sizeof (b1)) != 0) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* is outside \[0, 2\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* is outside \[0, 3\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* is outside \[0, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]*\) >= 4} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 2[)]* is outside \[0, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 2[)]* is outside \[0, 6\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 2[)]* is outside \[0, 8\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 2[)]* >= 8} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 4[)]* is outside \[0, 8\)} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 4[)]* is outside \[0, 12\)} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 4[)]* is outside \[0, 16\)} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 4[)]* >= 16} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 16\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 24\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 32\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */
/* { dg-do run } */
#define N 200
#define DIST 32
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE(I) ((I) * 5 / 2)
#define ADD_TEST(TYPE) \
TYPE a_##TYPE[N * 2]; \
void __attribute__((noinline, noclone)) \
test_##TYPE (int x, int y) \
{ \
for (int i = 0; i < N; ++i) \
a_##TYPE[i + x] += a_##TYPE[i + y]; \
}
#define DO_TEST(TYPE) \
for (int i = 0; i < DIST * 2; ++i) \
{ \
for (int j = 0; j < N + DIST * 2; ++j) \
a_##TYPE[j] = TEST_VALUE (j); \
test_##TYPE (i, DIST); \
for (int j = 0; j < N + DIST * 2; ++j) \
{ \
TYPE expected; \
if (j < i || j >= i + N) \
expected = TEST_VALUE (j); \
else if (i <= DIST) \
expected = ((TYPE) TEST_VALUE (j) \
+ (TYPE) TEST_VALUE (j - i + DIST)); \
else \
expected = ((TYPE) TEST_VALUE (j) \
+ a_##TYPE[j - i + DIST]); \
if (expected != a_##TYPE[j]) \
__builtin_abort (); \
} \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}
/* { dg-do run } */
#define N 200
#define M 4
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE(I) ((I) * 5 / 2)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *a, TYPE *b) \
{ \
for (int i = 0; i < N; i += 2) \
{ \
a[i + 0] = b[i + 0] + 2; \
a[i + 1] = b[i + 1] + 3; \
} \
}
#define DO_TEST(TYPE) \
for (int j = 1; j < M; ++j) \
{ \
TYPE a[N + M]; \
for (int i = 0; i < N + M; ++i) \
a[i] = TEST_VALUE (i); \
test_##TYPE (a + j, a); \
for (int i = 0; i < N; i += 2) \
if (a[i + j] != (TYPE) (a[i] + 2) \
|| a[i + j + 1] != (TYPE) (a[i + 1] + 3)) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
void
foo (double *x, int m)
{
for (int i = 0; i < 256; ++i)
x[i * m] += x[i * m];
}
/* { dg-final { scan-assembler-times {\tcbz\tw1,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, } 1 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, } 1 } } */
/* { dg-final { scan-assembler-times {\tldr\t} 1 } } */
/* { dg-final { scan-assembler-times {\tstr\t} 1 } } */
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, unsigned short n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * n];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (VF-1)*4 rather than (257-1)*4. */
/* { dg-final { scan-assembler-not {, 1024} } } */
/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
extern void abort (void) __attribute__ ((noreturn));
#define MARGIN 6
void __attribute__ ((weak, optimize ("no-tree-vectorize")))
test (int n, int m, int offset)
{
int abs_n = (n < 0 ? -n : n);
int abs_m = (m < 0 ? -m : m);
int max_i = (abs_n > abs_m ? abs_n : abs_m);
int abs_offset = (offset < 0 ? -offset : offset);
int size = MARGIN * 2 + max_i * SIZE + abs_offset;
TYPE *array = (TYPE *) __builtin_alloca (size * sizeof (TYPE));
for (int i = 0; i < size; ++i)
array[i] = i;
int base_x = offset < 0 ? MARGIN - offset : MARGIN;
int base_y = offset < 0 ? MARGIN : MARGIN + offset;
int start_x = n < 0 ? base_x - n * (SIZE - 1) : base_x;
int start_y = m < 0 ? base_y - m * (SIZE - 1) : base_y;
f (&array[start_x], &array[start_y], n, m);
int j = 0;
int start = (n < 0 ? size - 1 : 0);
int end = (n < 0 ? -1 : size);
int inc = (n < 0 ? -1 : 1);
for (int i = start; i != end; i += inc)
{
if (j == SIZE || i != start_x + j * n)
{
if (array[i] != i)
abort ();
}
else if (n == 0)
{
TYPE sum = i;
for (; j < SIZE; j++)
{
int next_y = start_y + j * m;
if (n >= 0 ? next_y < i : next_y > i)
sum += array[next_y];
else if (next_y == i)
sum += sum;
else
sum += next_y;
}
if (array[i] != sum)
abort ();
}
else
{
int next_y = start_y + j * m;
TYPE base = i;
if (n >= 0 ? next_y < i : next_y > i)
base += array[next_y];
else
base += next_y;
if (array[i] != base)
abort ();
j += 1;
}
}
}
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_1.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = 0; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, n, offset);
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, unsigned short n, unsigned short m)
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (257-1)*4 rather than (VF-1)*4. */
/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\n} 2 } } */
/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
/* Two range checks and a check for n being zero. (m being zero is OK.) */
/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_2.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = 0; n < 10; ++n)
for (int m = 0; m < 10; ++m)
for (int offset = -17; offset <= 17; ++offset)
{
test (n, m, offset);
test (n, m, offset + n * (SIZE - 1));
}
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, int n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * n];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (VF-1)*4 rather than (257-1)*4. */
/* { dg-final { scan-assembler-not {, 1024} } } */
/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_3.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, n, offset);
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, int n, int m)
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (257-1)*4 rather than (VF-1)*4. */
/* { dg-final { scan-assembler-times {\tlsl\tx[0-9]+, x[0-9]+, 10\n} 2 } } */
/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
/* { dg-final { scan-assembler {\tcmp\tw3, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 4 } } */
/* Two range checks and a check for n being zero. (m being zero is OK.) */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_4.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int m = -10; m < 10; ++m)
for (int offset = -17; offset <= 17; ++offset)
{
test (n, m, offset);
test (n, m, offset + n * (SIZE - 1));
}
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE double
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, long n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * n];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
/* Should multiply by (VF-1)*8 rather than (257-1)*8. */
/* { dg-final { scan-assembler-not {, 2048} } } */
/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]11} } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_5.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, n, offset);
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE long
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, long n, long m)
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tx[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tx[0-9]+} } } */
/* Should multiply by (257-1)*8 rather than (VF-1)*8. */
/* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 2 } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 4 } } */
/* Two range checks and a check for n being zero. (m being zero is OK.) */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_6.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int m = -10; m < 10; ++m)
for (int offset = -17; offset <= 17; ++offset)
{
test (n, m, offset);
test (n, m, offset + n * (SIZE - 1));
}
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE double
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, long n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
/* Should multiply by (257-1)*8 rather than (VF-1)*8. */
/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x1, 2048} 1 } } */
/* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 1 } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_7.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, 1, offset);
return 0;
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE long
#define SIZE 257
void
f (TYPE *x, TYPE *y, long n __attribute__((unused)), long m)
{
for (int i = 0; i < SIZE; ++i)
x[i] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tx[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tx[0-9]+} } } */
/* Should multiply by (257-1)*8 rather than (VF-1)*8. */
/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x0, 2048} 1 } } */
/* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 1 } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks only; doesn't matter whether n is zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_8.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (1, n, offset);
return 0;
}
! { dg-do run }
! { dg-additional-options "-fno-inline" }
#define N 200
#define TEST_VALUE(I) ((I) * 5 / 2)
subroutine setup(a)
real :: a(N)
do i = 1, N
a(i) = TEST_VALUE(i)
end do
end subroutine
subroutine check(a, x, gap)
real :: a(N), temp, x
integer :: gap
do i = 1, N - gap
temp = a(i + gap) + x
if (a(i) /= temp) call abort
end do
do i = N - gap + 1, N
temp = TEST_VALUE(i)
if (a(i) /= temp) call abort
end do
end subroutine
subroutine testa(a, x, base, n)
real :: a(n), x
integer :: base, n
do i = n, 2, -1
a(base + i - 1) = a(base + i) + x
end do
end subroutine testa
subroutine testb(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 4, -1
a(base + i - 3) = a(base + i) + x
end do
end subroutine testb
subroutine testc(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 8, -1
a(base + i - 7) = a(base + i) + x
end do
end subroutine testc
subroutine testd(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 16, -1
a(base + i - 15) = a(base + i) + x
end do
end subroutine testd
subroutine teste(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 32, -1
a(base + i - 31) = a(base + i) + x
end do
end subroutine teste
subroutine testf(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 64, -1
a(base + i - 63) = a(base + i) + x
end do
end subroutine testf
program main
real :: a(N)
call setup(a)
call testa(a, 91.0, 0, N)
call check(a, 91.0, 1)
call setup(a)
call testb(a, 55.0, 0, N)
call check(a, 55.0, 3)
call setup(a)
call testc(a, 72.0, 0, N)
call check(a, 72.0, 7)
call setup(a)
call testd(a, 69.0, 0, N)
call check(a, 69.0, 15)
call setup(a)
call teste(a, 44.0, 0, N)
call check(a, 44.0, 31)
call setup(a)
call testf(a, 39.0, 0, N)
call check(a, 39.0, 63)
end program
...@@ -203,11 +203,20 @@ typedef struct data_reference *data_reference_p; ...@@ -203,11 +203,20 @@ typedef struct data_reference *data_reference_p;
struct dr_with_seg_len struct dr_with_seg_len
{ {
dr_with_seg_len (data_reference_p d, tree len) dr_with_seg_len (data_reference_p d, tree len, unsigned HOST_WIDE_INT size,
: dr (d), seg_len (len) {} unsigned int a)
: dr (d), seg_len (len), access_size (size), align (a) {}
data_reference_p dr; data_reference_p dr;
/* The offset of the last access that needs to be checked minus
the offset of the first. */
tree seg_len; tree seg_len;
/* A value that, when added to abs (SEG_LEN), gives the total number of
bytes in the segment. */
poly_uint64 access_size;
/* The minimum common alignment of DR's start address, SEG_LEN and
ACCESS_SIZE. */
unsigned int align;
}; };
/* This struct contains two dr_with_seg_len objects with aliasing data /* This struct contains two dr_with_seg_len objects with aliasing data
...@@ -475,6 +484,10 @@ extern void prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *, ...@@ -475,6 +484,10 @@ extern void prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *,
poly_uint64); poly_uint64);
extern void create_runtime_alias_checks (struct loop *, extern void create_runtime_alias_checks (struct loop *,
vec<dr_with_seg_len_pair_t> *, tree*); vec<dr_with_seg_len_pair_t> *, tree*);
extern tree dr_direction_indicator (struct data_reference *);
extern tree dr_zero_step_indicator (struct data_reference *);
extern bool dr_known_forward_stride_p (struct data_reference *);
/* Return true when the base objects of data references A and B are /* Return true when the base objects of data references A and B are
the same memory object. */ the same memory object. */
......
...@@ -2330,16 +2330,12 @@ break_alias_scc_partitions (struct graph *rdg, ...@@ -2330,16 +2330,12 @@ break_alias_scc_partitions (struct graph *rdg,
static tree static tree
data_ref_segment_size (struct data_reference *dr, tree niters) data_ref_segment_size (struct data_reference *dr, tree niters)
{ {
tree segment_length; niters = size_binop (MINUS_EXPR,
fold_convert (sizetype, niters),
if (integer_zerop (DR_STEP (dr))) size_one_node);
segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); return size_binop (MULT_EXPR,
else fold_convert (sizetype, DR_STEP (dr)),
segment_length = size_binop (MULT_EXPR, fold_convert (sizetype, niters));
fold_convert (sizetype, DR_STEP (dr)),
fold_convert (sizetype, niters));
return segment_length;
} }
/* Return true if LOOP's latch is dominated by statement for data reference /* Return true if LOOP's latch is dominated by statement for data reference
...@@ -2394,9 +2390,16 @@ compute_alias_check_pairs (struct loop *loop, vec<ddr_p> *alias_ddrs, ...@@ -2394,9 +2390,16 @@ compute_alias_check_pairs (struct loop *loop, vec<ddr_p> *alias_ddrs,
else else
seg_length_b = data_ref_segment_size (dr_b, niters); seg_length_b = data_ref_segment_size (dr_b, niters);
unsigned HOST_WIDE_INT access_size_a
= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
unsigned HOST_WIDE_INT access_size_b
= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
dr_with_seg_len_pair_t dr_with_seg_len_pair dr_with_seg_len_pair_t dr_with_seg_len_pair
(dr_with_seg_len (dr_a, seg_length_a), (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
dr_with_seg_len (dr_b, seg_length_b)); dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b));
/* Canonicalize pairs by sorting the two DR members. */ /* Canonicalize pairs by sorting the two DR members. */
if (comp_res > 0) if (comp_res > 0)
......
...@@ -2875,6 +2875,31 @@ vect_create_cond_for_unequal_addrs (loop_vec_info loop_vinfo, tree *cond_expr) ...@@ -2875,6 +2875,31 @@ vect_create_cond_for_unequal_addrs (loop_vec_info loop_vinfo, tree *cond_expr)
} }
} }
/* Create an expression that is true when all lower-bound conditions for
the vectorized loop are met. Chain this condition with *COND_EXPR. */
static void
vect_create_cond_for_lower_bounds (loop_vec_info loop_vinfo, tree *cond_expr)
{
vec<vec_lower_bound> lower_bounds = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo);
for (unsigned int i = 0; i < lower_bounds.length (); ++i)
{
tree expr = lower_bounds[i].expr;
tree type = unsigned_type_for (TREE_TYPE (expr));
expr = fold_convert (type, expr);
poly_uint64 bound = lower_bounds[i].min_value;
if (!lower_bounds[i].unsigned_p)
{
expr = fold_build2 (PLUS_EXPR, type, expr,
build_int_cstu (type, bound - 1));
bound += bound - 1;
}
tree part_cond_expr = fold_build2 (GE_EXPR, boolean_type_node, expr,
build_int_cstu (type, bound));
chain_cond_expr (cond_expr, part_cond_expr);
}
}
/* Function vect_create_cond_for_alias_checks. /* Function vect_create_cond_for_alias_checks.
Create a conditional expression that represents the run-time checks for Create a conditional expression that represents the run-time checks for
...@@ -2986,6 +3011,7 @@ vect_loop_versioning (loop_vec_info loop_vinfo, ...@@ -2986,6 +3011,7 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
if (version_alias) if (version_alias)
{ {
vect_create_cond_for_unequal_addrs (loop_vinfo, &cond_expr); vect_create_cond_for_unequal_addrs (loop_vinfo, &cond_expr);
vect_create_cond_for_lower_bounds (loop_vinfo, &cond_expr);
vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr); vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr);
} }
......
...@@ -2475,6 +2475,7 @@ again: ...@@ -2475,6 +2475,7 @@ again:
} }
} }
/* Free optimized alias test DDRS. */ /* Free optimized alias test DDRS. */
LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).truncate (0);
LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).release (); LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).release ();
LOOP_VINFO_CHECK_UNEQUAL_ADDRS (loop_vinfo).release (); LOOP_VINFO_CHECK_UNEQUAL_ADDRS (loop_vinfo).release ();
/* Reset target cost data. */ /* Reset target cost data. */
...@@ -3673,6 +3674,18 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, ...@@ -3673,6 +3674,18 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
/* Count LEN - 1 ANDs and LEN comparisons. */ /* Count LEN - 1 ANDs and LEN comparisons. */
(void) add_stmt_cost (target_cost_data, len * 2 - 1, scalar_stmt, (void) add_stmt_cost (target_cost_data, len * 2 - 1, scalar_stmt,
NULL, 0, vect_prologue); NULL, 0, vect_prologue);
len = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).length ();
if (len)
{
/* Count LEN - 1 ANDs and LEN comparisons. */
unsigned int nstmts = len * 2 - 1;
/* +1 for each bias that needs adding. */
for (unsigned int i = 0; i < len; ++i)
if (!LOOP_VINFO_LOWER_BOUNDS (loop_vinfo)[i].unsigned_p)
nstmts += 1;
(void) add_stmt_cost (target_cost_data, nstmts, scalar_stmt,
NULL, 0, vect_prologue);
}
dump_printf (MSG_NOTE, dump_printf (MSG_NOTE,
"cost model: Adding cost of checks for loop " "cost model: Adding cost of checks for loop "
"versioning aliasing.\n"); "versioning aliasing.\n");
......
...@@ -174,6 +174,18 @@ typedef struct _slp_instance { ...@@ -174,6 +174,18 @@ typedef struct _slp_instance {
loop to be valid. */ loop to be valid. */
typedef std::pair<tree, tree> vec_object_pair; typedef std::pair<tree, tree> vec_object_pair;
/* Records that vectorization is only possible if abs (EXPR) >= MIN_VALUE.
UNSIGNED_P is true if we can assume that abs (EXPR) == EXPR. */
struct vec_lower_bound {
vec_lower_bound () {}
vec_lower_bound (tree e, bool u, poly_uint64 m)
: expr (e), unsigned_p (u), min_value (m) {}
tree expr;
bool unsigned_p;
poly_uint64 min_value;
};
/* Vectorizer state common between loop and basic-block vectorization. */ /* Vectorizer state common between loop and basic-block vectorization. */
struct vec_info { struct vec_info {
enum vec_kind { bb, loop }; enum vec_kind { bb, loop };
...@@ -406,6 +418,14 @@ typedef struct _loop_vec_info : public vec_info { ...@@ -406,6 +418,14 @@ typedef struct _loop_vec_info : public vec_info {
/* Check that the addresses of each pair of objects is unequal. */ /* Check that the addresses of each pair of objects is unequal. */
auto_vec<vec_object_pair> check_unequal_addrs; auto_vec<vec_object_pair> check_unequal_addrs;
/* List of values that are required to be nonzero. This is used to check
whether things like "x[i * n] += 1;" are safe and eventually gets added
to the checks for lower bounds below. */
auto_vec<tree> check_nonzero;
/* List of values that need to be checked for a minimum value. */
auto_vec<vec_lower_bound> lower_bounds;
/* Statements in the loop that have data references that are candidates for a /* Statements in the loop that have data references that are candidates for a
runtime (loop versioning) misalignment check. */ runtime (loop versioning) misalignment check. */
auto_vec<gimple *> may_misalign_stmts; auto_vec<gimple *> may_misalign_stmts;
...@@ -514,6 +534,8 @@ typedef struct _loop_vec_info : public vec_info { ...@@ -514,6 +534,8 @@ typedef struct _loop_vec_info : public vec_info {
#define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs
#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs #define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs
#define LOOP_VINFO_CHECK_UNEQUAL_ADDRS(L) (L)->check_unequal_addrs #define LOOP_VINFO_CHECK_UNEQUAL_ADDRS(L) (L)->check_unequal_addrs
#define LOOP_VINFO_CHECK_NONZERO(L) (L)->check_nonzero
#define LOOP_VINFO_LOWER_BOUNDS(L) (L)->lower_bounds
#define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores
#define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances
#define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
...@@ -534,7 +556,8 @@ typedef struct _loop_vec_info : public vec_info { ...@@ -534,7 +556,8 @@ typedef struct _loop_vec_info : public vec_info {
((L)->may_misalign_stmts.length () > 0) ((L)->may_misalign_stmts.length () > 0)
#define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \
((L)->comp_alias_ddrs.length () > 0 \ ((L)->comp_alias_ddrs.length () > 0 \
|| (L)->check_unequal_addrs.length () > 0) || (L)->check_unequal_addrs.length () > 0 \
|| (L)->lower_bounds.length () > 0)
#define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \
(LOOP_VINFO_NITERS_ASSUMPTIONS (L)) (LOOP_VINFO_NITERS_ASSUMPTIONS (L))
#define LOOP_REQUIRES_VERSIONING(L) \ #define LOOP_REQUIRES_VERSIONING(L) \
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment