Commit 535e7c11 by Richard Sandiford Committed by Richard Sandiford

Handle peeling for alignment with masking

This patch adds support for aligning vectors by using a partial
first iteration.  E.g. if the start pointer is 3 elements beyond
an aligned address, the first iteration will have a mask in which
the first three elements are false.

On SVE, the optimisation is only useful for vector-length-specific
code.  Vector-length-agnostic code doesn't try to align vectors
since the vector length might not be a power of 2.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
	(LOOP_VINFO_MASK_SKIP_NITERS): New macro.
	(vect_use_loop_mask_for_alignment_p): New function.
	(vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
	* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
	niters_skip argument.  Make sure that the first niters_skip elements
	of the first iteration are inactive.
	(vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
	Update call to vect_set_loop_masks_directly.
	(get_misalign_in_elems): New function, split out from...
	(vect_gen_prolog_loop_niters): ...here.
	(vect_update_init_of_dr): Take a code argument that specifies whether
	the adjustment should be added or subtracted.
	(vect_update_init_of_drs): Likewise.
	(vect_prepare_for_masked_peels): New function.
	(vect_do_peeling): Skip prologue peeling if we're using a mask
	instead.  Update call to vect_update_inits_of_drs.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
	mask_skip_niters.
	(vect_analyze_loop_2): Allow fully-masked loops with peeling for
	alignment.  Do not include the number of peeled iterations in
	the minimum threshold in that case.
	(vectorizable_induction): Adjust the start value down by
	LOOP_VINFO_MASK_SKIP_NITERS iterations.
	(vect_transform_loop): Call vect_prepare_for_masked_peels.
	Take the number of skipped iterations into account when calculating
	the loop bounds.
	* tree-vect-stmts.c (vect_gen_while_not): New function.

gcc/testsuite/
	* gcc.target/aarch64/sve/nopeel_1.c: New test.
	* gcc.target/aarch64/sve/peel_ind_1.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_4.c: Likewise.
	* gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256630
parent c2700f74
...@@ -2,6 +2,39 @@ ...@@ -2,6 +2,39 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
(LOOP_VINFO_MASK_SKIP_NITERS): New macro.
(vect_use_loop_mask_for_alignment_p): New function.
(vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
niters_skip argument. Make sure that the first niters_skip elements
of the first iteration are inactive.
(vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
Update call to vect_set_loop_masks_directly.
(get_misalign_in_elems): New function, split out from...
(vect_gen_prolog_loop_niters): ...here.
(vect_update_init_of_dr): Take a code argument that specifies whether
the adjustment should be added or subtracted.
(vect_update_init_of_drs): Likewise.
(vect_prepare_for_masked_peels): New function.
(vect_do_peeling): Skip prologue peeling if we're using a mask
instead. Update call to vect_update_inits_of_drs.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_skip_niters.
(vect_analyze_loop_2): Allow fully-masked loops with peeling for
alignment. Do not include the number of peeled iterations in
the minimum threshold in that case.
(vectorizable_induction): Adjust the start value down by
LOOP_VINFO_MASK_SKIP_NITERS iterations.
(vect_transform_loop): Call vect_prepare_for_masked_peels.
Take the number of skipped iterations into account when calculating
the loop bounds.
* tree-vect-stmts.c (vect_gen_while_not): New function.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* doc/sourcebuild.texi (vect_fully_masked): Document. * doc/sourcebuild.texi (vect_fully_masked): Document.
* params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and * params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and
default value to 0. default value to 0.
......
...@@ -2,6 +2,20 @@ ...@@ -2,6 +2,20 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* gcc.target/aarch64/sve/nopeel_1.c: New test.
* gcc.target/aarch64/sve/peel_ind_1.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* lib/target-supports.exp (check_effective_target_vect_fully_masked): * lib/target-supports.exp (check_effective_target_vect_fully_masked):
New proc. New proc.
* gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if * gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if
......
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256" } */
#include <stdint.h>
#define TEST(NAME, TYPE) \
void \
NAME##1 (TYPE *x, int n) \
{ \
for (int i = 0; i < n; ++i) \
x[i] += 1; \
} \
TYPE NAME##_array[1024]; \
void \
NAME##2 (void) \
{ \
for (int i = 1; i < 200; ++i) \
NAME##_array[i] += 1; \
}
TEST (s8, int8_t)
TEST (u8, uint8_t)
TEST (s16, int16_t)
TEST (u16, uint16_t)
TEST (s32, int32_t)
TEST (u32, uint32_t)
TEST (s64, int64_t)
TEST (u64, uint64_t)
TEST (f16, _Float16)
TEST (f32, float)
TEST (f64, double)
/* No scalar memory accesses. */
/* { dg-final { scan-assembler-not {[wx][0-9]*, \[} } } */
/* 2 for each NAME##1 test, one in the header and one in the main loop
and 1 for each NAME##2 test, in the main loop only. */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b,} 6 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h,} 9 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s,} 9 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d,} 9 } } */
/* { dg-do compile } */
/* Pick an arbitrary target for which unaligned accesses are more
expensive. */
/* { dg-options "-O3 -msve-vector-bits=256 -mtune=thunderx" } */
#define N 512
#define START 1
#define END 505
int x[N] __attribute__((aligned(32)));
void __attribute__((noinline, noclone))
foo (void)
{
unsigned int v = 0;
for (unsigned int i = START; i < END; ++i)
{
x[i] = v;
v += 5;
}
}
/* We should operate on aligned vectors. */
/* { dg-final { scan-assembler {\tadrp\tx[0-9]+, x\n} } } */
/* We should use an induction that starts at -5, with only the last
7 elements of the first iteration being active. */
/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #-5, #5\n} } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O3 -mtune=thunderx" } */
/* { dg-options "-O3 -mtune=thunderx -msve-vector-bits=256" { target aarch64_sve256_hw } } */
#include "peel_ind_1.c"
int __attribute__ ((optimize (1)))
main (void)
{
foo ();
for (int i = 0; i < N; ++i)
{
if (x[i] != (i < START || i >= END ? 0 : (i - START) * 5))
__builtin_abort ();
asm volatile ("" ::: "memory");
}
return 0;
}
/* { dg-do compile } */
/* Pick an arbitrary target for which unaligned accesses are more
expensive. */
/* { dg-options "-O3 -msve-vector-bits=256 -mtune=thunderx" } */
#define N 512
#define START 7
#define END 22
int x[N] __attribute__((aligned(32)));
void __attribute__((noinline, noclone))
foo (void)
{
for (unsigned int i = START; i < END; ++i)
x[i] = i;
}
/* We should operate on aligned vectors. */
/* { dg-final { scan-assembler {\tadrp\tx[0-9]+, x\n} } } */
/* We should unroll the loop three times. */
/* { dg-final { scan-assembler-times "\tst1w\t" 3 } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O3 -mtune=thunderx" } */
/* { dg-options "-O3 -mtune=thunderx -msve-vector-bits=256" { target aarch64_sve256_hw } } */
#include "peel_ind_2.c"
int __attribute__ ((optimize (1)))
main (void)
{
foo ();
for (int i = 0; i < N; ++i)
{
if (x[i] != (i < START || i >= END ? 0 : i))
__builtin_abort ();
asm volatile ("" ::: "memory");
}
return 0;
}
/* { dg-do compile } */
/* Pick an arbitrary target for which unaligned accesses are more
expensive. */
/* { dg-options "-O3 -msve-vector-bits=256 -mtune=thunderx" } */
#define N 32
#define MAX_START 8
#define COUNT 16
int x[MAX_START][N] __attribute__((aligned(32)));
void __attribute__((noinline, noclone))
foo (int start)
{
for (int i = start; i < start + COUNT; ++i)
x[start][i] = i;
}
/* We should operate on aligned vectors. */
/* { dg-final { scan-assembler {\tadrp\tx[0-9]+, x\n} } } */
/* { dg-final { scan-assembler {\tubfx\t} } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O3 -mtune=thunderx" } */
/* { dg-options "-O3 -mtune=thunderx -msve-vector-bits=256" { target aarch64_sve256_hw } } */
#include "peel_ind_3.c"
int __attribute__ ((optimize (1)))
main (void)
{
for (int start = 0; start < MAX_START; ++start)
{
foo (start);
for (int i = 0; i < N; ++i)
{
if (x[start][i] != (i < start || i >= start + COUNT ? 0 : i))
__builtin_abort ();
asm volatile ("" ::: "memory");
}
}
return 0;
}
/* { dg-do compile } */
/* Pick an arbitrary target for which unaligned accesses are more
expensive. */
/* { dg-options "-Ofast -msve-vector-bits=256 -mtune=thunderx -fno-vect-cost-model" } */
#define START 1
#define END 505
void __attribute__((noinline, noclone))
foo (double *x)
{
double v = 10.0;
for (unsigned int i = START; i < END; ++i)
{
x[i] = v;
v += 5.0;
}
}
/* We should operate on aligned vectors. */
/* { dg-final { scan-assembler {\tubfx\t} } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-Ofast -mtune=thunderx" } */
/* { dg-options "-Ofast -mtune=thunderx -mtune=thunderx" { target aarch64_sve256_hw } } */
#include "peel_ind_4.c"
int __attribute__ ((optimize (1)))
main (void)
{
double x[END + 1];
for (int i = 0; i < END + 1; ++i)
{
x[i] = i;
asm volatile ("" ::: "memory");
}
foo (x);
for (int i = 0; i < END + 1; ++i)
{
double expected;
if (i < START || i >= END)
expected = i;
else
expected = 10 + (i - START) * 5;
if (x[i] != expected)
__builtin_abort ();
asm volatile ("" ::: "memory");
}
return 0;
}
...@@ -1121,6 +1121,7 @@ _loop_vec_info::_loop_vec_info (struct loop *loop_in) ...@@ -1121,6 +1121,7 @@ _loop_vec_info::_loop_vec_info (struct loop *loop_in)
versioning_threshold (0), versioning_threshold (0),
vectorization_factor (0), vectorization_factor (0),
max_vectorization_factor (0), max_vectorization_factor (0),
mask_skip_niters (NULL_TREE),
mask_compare_type (NULL_TREE), mask_compare_type (NULL_TREE),
unaligned_dr (NULL), unaligned_dr (NULL),
peeling_for_alignment (0), peeling_for_alignment (0),
...@@ -2269,16 +2270,6 @@ start_over: ...@@ -2269,16 +2270,6 @@ start_over:
" gaps is required.\n"); " gaps is required.\n");
} }
if (LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
&& LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo))
{
LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"can't use a fully-masked loop because peeling for"
" alignment is required.\n");
}
/* Decide whether to use a fully-masked loop for this vectorization /* Decide whether to use a fully-masked loop for this vectorization
factor. */ factor. */
LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
...@@ -2379,18 +2370,21 @@ start_over: ...@@ -2379,18 +2370,21 @@ start_over:
increase threshold for this case if necessary. */ increase threshold for this case if necessary. */
if (LOOP_REQUIRES_VERSIONING (loop_vinfo)) if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
{ {
poly_uint64 niters_th; poly_uint64 niters_th = 0;
/* Niters for peeled prolog loop. */ if (!vect_use_loop_mask_for_alignment_p (loop_vinfo))
if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
{ {
struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo); /* Niters for peeled prolog loop. */
tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))); if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
{
niters_th = TYPE_VECTOR_SUBPARTS (vectype) - 1; struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
tree vectype
= STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
niters_th += TYPE_VECTOR_SUBPARTS (vectype) - 1;
}
else
niters_th += LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
} }
else
niters_th = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
/* Niters for at least one iteration of vectorized loop. */ /* Niters for at least one iteration of vectorized loop. */
if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
...@@ -7336,9 +7330,28 @@ vectorizable_induction (gimple *phi, ...@@ -7336,9 +7330,28 @@ vectorizable_induction (gimple *phi,
init_expr = PHI_ARG_DEF_FROM_EDGE (phi, init_expr = PHI_ARG_DEF_FROM_EDGE (phi,
loop_preheader_edge (iv_loop)); loop_preheader_edge (iv_loop));
/* Convert the step to the desired type. */ /* Convert the initial value and step to the desired type. */
stmts = NULL; stmts = NULL;
init_expr = gimple_convert (&stmts, TREE_TYPE (vectype), init_expr);
step_expr = gimple_convert (&stmts, TREE_TYPE (vectype), step_expr); step_expr = gimple_convert (&stmts, TREE_TYPE (vectype), step_expr);
/* If we are using the loop mask to "peel" for alignment then we need
to adjust the start value here. */
tree skip_niters = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
if (skip_niters != NULL_TREE)
{
if (FLOAT_TYPE_P (vectype))
skip_niters = gimple_build (&stmts, FLOAT_EXPR, TREE_TYPE (vectype),
skip_niters);
else
skip_niters = gimple_convert (&stmts, TREE_TYPE (vectype),
skip_niters);
tree skip_step = gimple_build (&stmts, MULT_EXPR, TREE_TYPE (vectype),
skip_niters, step_expr);
init_expr = gimple_build (&stmts, MINUS_EXPR, TREE_TYPE (vectype),
init_expr, skip_step);
}
if (stmts) if (stmts)
{ {
new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
...@@ -8209,6 +8222,11 @@ vect_transform_loop (loop_vec_info loop_vinfo) ...@@ -8209,6 +8222,11 @@ vect_transform_loop (loop_vec_info loop_vinfo)
split_edge (loop_preheader_edge (loop)); split_edge (loop_preheader_edge (loop));
if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
&& vect_use_loop_mask_for_alignment_p (loop_vinfo))
/* This will deal with any possible peeling. */
vect_prepare_for_masked_peels (loop_vinfo);
/* FORNOW: the vectorizer supports only loops which body consist /* FORNOW: the vectorizer supports only loops which body consist
of one basic block (header + empty latch). When the vectorizer will of one basic block (header + empty latch). When the vectorizer will
support more involved loop forms, the order by which the BBs are support more involved loop forms, the order by which the BBs are
...@@ -8488,29 +8506,40 @@ vect_transform_loop (loop_vec_info loop_vinfo) ...@@ -8488,29 +8506,40 @@ vect_transform_loop (loop_vec_info loop_vinfo)
/* +1 to convert latch counts to loop iteration counts, /* +1 to convert latch counts to loop iteration counts,
-min_epilogue_iters to remove iterations that cannot be performed -min_epilogue_iters to remove iterations that cannot be performed
by the vector code. */ by the vector code. */
int bias = 1 - min_epilogue_iters; int bias_for_lowest = 1 - min_epilogue_iters;
int bias_for_assumed = bias_for_lowest;
int alignment_npeels = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
if (alignment_npeels && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
{
/* When the amount of peeling is known at compile time, the first
iteration will have exactly alignment_npeels active elements.
In the worst case it will have at least one. */
int min_first_active = (alignment_npeels > 0 ? alignment_npeels : 1);
bias_for_lowest += lowest_vf - min_first_active;
bias_for_assumed += assumed_vf - min_first_active;
}
/* In these calculations the "- 1" converts loop iteration counts /* In these calculations the "- 1" converts loop iteration counts
back to latch counts. */ back to latch counts. */
if (loop->any_upper_bound) if (loop->any_upper_bound)
loop->nb_iterations_upper_bound loop->nb_iterations_upper_bound
= (final_iter_may_be_partial = (final_iter_may_be_partial
? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias, ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
lowest_vf) - 1 lowest_vf) - 1
: wi::udiv_floor (loop->nb_iterations_upper_bound + bias, : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
lowest_vf) - 1); lowest_vf) - 1);
if (loop->any_likely_upper_bound) if (loop->any_likely_upper_bound)
loop->nb_iterations_likely_upper_bound loop->nb_iterations_likely_upper_bound
= (final_iter_may_be_partial = (final_iter_may_be_partial
? wi::udiv_ceil (loop->nb_iterations_likely_upper_bound + bias, ? wi::udiv_ceil (loop->nb_iterations_likely_upper_bound
lowest_vf) - 1 + bias_for_lowest, lowest_vf) - 1
: wi::udiv_floor (loop->nb_iterations_likely_upper_bound + bias, : wi::udiv_floor (loop->nb_iterations_likely_upper_bound
lowest_vf) - 1); + bias_for_lowest, lowest_vf) - 1);
if (loop->any_estimate) if (loop->any_estimate)
loop->nb_iterations_estimate loop->nb_iterations_estimate
= (final_iter_may_be_partial = (final_iter_may_be_partial
? wi::udiv_ceil (loop->nb_iterations_estimate + bias, ? wi::udiv_ceil (loop->nb_iterations_estimate + bias_for_assumed,
assumed_vf) - 1 assumed_vf) - 1
: wi::udiv_floor (loop->nb_iterations_estimate + bias, : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
assumed_vf) - 1); assumed_vf) - 1);
if (dump_enabled_p ()) if (dump_enabled_p ())
......
...@@ -9991,3 +9991,16 @@ vect_gen_while (tree mask, tree start_index, tree end_index) ...@@ -9991,3 +9991,16 @@ vect_gen_while (tree mask, tree start_index, tree end_index)
gimple_call_set_lhs (call, mask); gimple_call_set_lhs (call, mask);
return call; return call;
} }
/* Generate a vector mask of type MASK_TYPE for which index I is false iff
J + START_INDEX < END_INDEX for all J <= I. Add the statements to SEQ. */
tree
vect_gen_while_not (gimple_seq *seq, tree mask_type, tree start_index,
tree end_index)
{
tree tmp = make_ssa_name (mask_type);
gcall *call = vect_gen_while (tmp, start_index, end_index);
gimple_seq_add_stmt (seq, call);
return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp);
}
...@@ -351,6 +351,12 @@ typedef struct _loop_vec_info : public vec_info { ...@@ -351,6 +351,12 @@ typedef struct _loop_vec_info : public vec_info {
on inactive scalars. */ on inactive scalars. */
vec_loop_masks masks; vec_loop_masks masks;
/* If we are using a loop mask to align memory addresses, this variable
contains the number of vector elements that we should skip in the
first iteration of the vector loop (i.e. the number of leading
elements that should be false in the first mask). */
tree mask_skip_niters;
/* Type of the variables to use in the WHILE_ULT call for fully-masked /* Type of the variables to use in the WHILE_ULT call for fully-masked
loops. */ loops. */
tree mask_compare_type; tree mask_compare_type;
...@@ -480,6 +486,7 @@ typedef struct _loop_vec_info : public vec_info { ...@@ -480,6 +486,7 @@ typedef struct _loop_vec_info : public vec_info {
#define LOOP_VINFO_VECT_FACTOR(L) (L)->vectorization_factor #define LOOP_VINFO_VECT_FACTOR(L) (L)->vectorization_factor
#define LOOP_VINFO_MAX_VECT_FACTOR(L) (L)->max_vectorization_factor #define LOOP_VINFO_MAX_VECT_FACTOR(L) (L)->max_vectorization_factor
#define LOOP_VINFO_MASKS(L) (L)->masks #define LOOP_VINFO_MASKS(L) (L)->masks
#define LOOP_VINFO_MASK_SKIP_NITERS(L) (L)->mask_skip_niters
#define LOOP_VINFO_MASK_COMPARE_TYPE(L) (L)->mask_compare_type #define LOOP_VINFO_MASK_COMPARE_TYPE(L) (L)->mask_compare_type
#define LOOP_VINFO_PTR_MASK(L) (L)->ptr_mask #define LOOP_VINFO_PTR_MASK(L) (L)->ptr_mask
#define LOOP_VINFO_LOOP_NEST(L) (L)->loop_nest #define LOOP_VINFO_LOOP_NEST(L) (L)->loop_nest
...@@ -1230,6 +1237,17 @@ unlimited_cost_model (loop_p loop) ...@@ -1230,6 +1237,17 @@ unlimited_cost_model (loop_p loop)
return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED); return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED);
} }
/* Return true if the loop described by LOOP_VINFO is fully-masked and
if the first iteration should use a partial mask in order to achieve
alignment. */
static inline bool
vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo)
{
return (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
&& LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo));
}
/* Return the number of vectors of type VECTYPE that are needed to get /* Return the number of vectors of type VECTYPE that are needed to get
NUNITS elements. NUNITS should be based on the vectorization factor, NUNITS elements. NUNITS should be based on the vectorization factor,
so it is always a known multiple of the number of elements in VECTYPE. */ so it is always a known multiple of the number of elements in VECTYPE. */
...@@ -1328,6 +1346,7 @@ extern void vect_loop_versioning (loop_vec_info, unsigned int, bool, ...@@ -1328,6 +1346,7 @@ extern void vect_loop_versioning (loop_vec_info, unsigned int, bool,
poly_uint64); poly_uint64);
extern struct loop *vect_do_peeling (loop_vec_info, tree, tree, extern struct loop *vect_do_peeling (loop_vec_info, tree, tree,
tree *, tree *, tree *, int, bool, bool); tree *, tree *, tree *, int, bool, bool);
extern void vect_prepare_for_masked_peels (loop_vec_info);
extern source_location find_loop_location (struct loop *); extern source_location find_loop_location (struct loop *);
extern bool vect_can_advance_ivs_p (loop_vec_info); extern bool vect_can_advance_ivs_p (loop_vec_info);
...@@ -1393,6 +1412,7 @@ extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &); ...@@ -1393,6 +1412,7 @@ extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &); extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
extern void optimize_mask_stores (struct loop*); extern void optimize_mask_stores (struct loop*);
extern gcall *vect_gen_while (tree, tree, tree); extern gcall *vect_gen_while (tree, tree, tree);
extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
/* In tree-vect-data-refs.c. */ /* In tree-vect-data-refs.c. */
extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int); extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment