Commit d9f21f6a by Richard Sandiford Committed by Richard Sandiford

poly_int: vectoriser vf and uf

This patch changes the type of the vectorisation factor and SLP
unrolling factor to poly_uint64.  This in turn required some knock-on
changes in signedness elsewhere.

Cost decisions are generally based on estimated_poly_value,
which for VF is wrapped up as vect_vf_for_cost.

The patch doesn't on its own enable variable-length vectorisation.
It just makes the minimum changes necessary for the code to build
with the new VF and UF types.  Later patches also make the
vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
GET_MODE_NUNITS, at which point the code really does handle
variable-length vectors.

The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
to avoid hard-coding a particular architectural limit.

The patch includes a new test because a development version of the patch
accidentally used file print routines instead of dump_*, which would
fail with -fopt-info.

2018-01-03  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
	from an unsigned int to a poly_uint64.
	(_loop_vec_info::slp_unrolling_factor): Likewise.
	(_loop_vec_info::vectorization_factor): Change from an int
	to a poly_uint64.
	(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
	(vect_get_num_vectors): New function.
	(vect_update_max_nunits, vect_vf_for_cost): Likewise.
	(vect_get_num_copies): Use vect_get_num_vectors.
	(vect_analyze_data_ref_dependences): Change max_vf from an int *
	to an unsigned int *.
	(vect_analyze_data_refs): Change min_vf from an int * to a
	poly_uint64 *.
	(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
	than an unsigned HOST_WIDE_INT.
	* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
	(vect_analyze_data_ref_dependence): Change max_vf from an int *
	to an unsigned int *.
	(vect_analyze_data_ref_dependences): Likewise.
	(vect_compute_data_ref_alignment): Handle polynomial vf.
	(vect_enhance_data_refs_alignment): Likewise.
	(vect_prune_runtime_alias_test_list): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	(vect_supportable_dr_alignment): Likewise.
	(dependence_distance_ge_vf): Take the vectorization factor as a
	poly_uint64 rather than an unsigned HOST_WIDE_INT.
	(vect_analyze_data_refs): Change min_vf from an int * to a
	poly_uint64 *.
	* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
	vfm1 as a poly_uint64 rather than an int.  Make the same change
	for the returned bound_scalar.
	(vect_gen_vector_loop_niters): Handle polynomial vf.
	(vect_do_peeling): Likewise.  Update call to
	vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
	(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
	be constant.
	* tree-vect-loop.c (vect_determine_vectorization_factor)
	(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
	(vect_get_known_peeling_cost): Likewise.
	(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
	(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
	(vect_transform_loop): Likewise.  Use the lowest possible VF when
	updating the upper bounds of the loop.
	(vect_min_worthwhile_factor): Make static.  Return an unsigned int
	rather than an int.
	* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
	polynomial unroll factors.
	(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
	(vect_make_slp_decision): Likewise.
	(vect_supported_load_permutation_p): Likewise, and polynomial
	vf too.
	(vect_analyze_slp_cost): Handle polynomial vf.
	(vect_slp_analyze_node_operations): Likewise.
	(vect_slp_analyze_bb_1): Likewise.
	(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
	than an unsigned HOST_WIDE_INT.
	* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
	(vectorizable_load): Handle polynomial vf.
	* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
	a poly_uint64.
	(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.

gcc/testsuite/
	* gcc.dg/vect-opt-info-1.c: New test.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256126
parent fba05d9e
...@@ -2,6 +2,72 @@ ...@@ -2,6 +2,72 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
from an unsigned int to a poly_uint64.
(_loop_vec_info::slp_unrolling_factor): Likewise.
(_loop_vec_info::vectorization_factor): Change from an int
to a poly_uint64.
(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
(vect_get_num_vectors): New function.
(vect_update_max_nunits, vect_vf_for_cost): Likewise.
(vect_get_num_copies): Use vect_get_num_vectors.
(vect_analyze_data_ref_dependences): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
(vect_analyze_data_ref_dependence): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_ref_dependences): Likewise.
(vect_compute_data_ref_alignment): Handle polynomial vf.
(vect_enhance_data_refs_alignment): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
(vect_shift_permute_load_chain): Likewise.
(vect_supportable_dr_alignment): Likewise.
(dependence_distance_ge_vf): Take the vectorization factor as a
poly_uint64 rather than an unsigned HOST_WIDE_INT.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
vfm1 as a poly_uint64 rather than an int. Make the same change
for the returned bound_scalar.
(vect_gen_vector_loop_niters): Handle polynomial vf.
(vect_do_peeling): Likewise. Update call to
vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
be constant.
* tree-vect-loop.c (vect_determine_vectorization_factor)
(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
(vect_transform_loop): Likewise. Use the lowest possible VF when
updating the upper bounds of the loop.
(vect_min_worthwhile_factor): Make static. Return an unsigned int
rather than an int.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
polynomial unroll factors.
(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
(vect_make_slp_decision): Likewise.
(vect_supported_load_permutation_p): Likewise, and polynomial
vf too.
(vect_analyze_slp_cost): Handle polynomial vf.
(vect_slp_analyze_node_operations): Likewise.
(vect_slp_analyze_bb_1): Likewise.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
(vectorizable_load): Handle polynomial vf.
* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
a poly_uint64.
(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* match.pd: Handle bit operations involving three constants * match.pd: Handle bit operations involving three constants
and try to fold one pair. and try to fold one pair.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* gcc.dg/vect-opt-info-1.c: New test.
2018-01-02 Michael Meissner <meissner@linux.vnet.ibm.com> 2018-01-02 Michael Meissner <meissner@linux.vnet.ibm.com>
* gcc.target/powerpc/float128-hw2.c: Add tests for ceilf128, * gcc.target/powerpc/float128-hw2.c: Add tests for ceilf128,
......
/* { dg-options "-std=c99 -fopt-info -O3" } */
void
vadd (int *dst, int *op1, int *op2, int count)
{
for (int i = 0; i < count; ++i)
dst[i] = op1[i] + op2[i];
}
/* { dg-message "loop vectorized" "" { target *-*-* } 6 } */
/* { dg-message "loop versioned for vectorization because of possible aliasing" "" { target *-*-* } 6 } */
...@@ -1234,8 +1234,9 @@ vect_build_loop_niters (loop_vec_info loop_vinfo, bool *new_var_p) ...@@ -1234,8 +1234,9 @@ vect_build_loop_niters (loop_vec_info loop_vinfo, bool *new_var_p)
static tree static tree
vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog, vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
int bound_prolog, int vfm1, int th, int bound_prolog, poly_int64 vfm1, int th,
int *bound_scalar, bool check_profitability) poly_uint64 *bound_scalar,
bool check_profitability)
{ {
tree type = TREE_TYPE (niters_prolog); tree type = TREE_TYPE (niters_prolog);
tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog, tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
...@@ -1250,21 +1251,23 @@ vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog, ...@@ -1250,21 +1251,23 @@ vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
/* Peeling for constant times. */ /* Peeling for constant times. */
if (int_niters_prolog >= 0) if (int_niters_prolog >= 0)
{ {
*bound_scalar = (int_niters_prolog + vfm1 < th *bound_scalar = upper_bound (int_niters_prolog + vfm1, th);
? th
: vfm1 + int_niters_prolog);
return build_int_cst (type, *bound_scalar); return build_int_cst (type, *bound_scalar);
} }
/* Peeling for unknown times. Note BOUND_PROLOG is the upper /* Peeling for unknown times. Note BOUND_PROLOG is the upper
bound (inlcuded) of niters of prolog loop. */ bound (inlcuded) of niters of prolog loop. */
if (th >= vfm1 + bound_prolog) if (known_ge (th, vfm1 + bound_prolog))
{ {
*bound_scalar = th; *bound_scalar = th;
return build_int_cst (type, th); return build_int_cst (type, th);
} }
/* Need to do runtime comparison, but BOUND_SCALAR remains the same. */ /* Need to do runtime comparison. */
else if (th > vfm1) else if (maybe_gt (th, vfm1))
return fold_build2 (MAX_EXPR, type, build_int_cst (type, th), niters); {
*bound_scalar = upper_bound (*bound_scalar, th);
return fold_build2 (MAX_EXPR, type,
build_int_cst (type, th), niters);
}
} }
return niters; return niters;
} }
...@@ -1292,7 +1295,7 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters, ...@@ -1292,7 +1295,7 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters,
{ {
tree ni_minus_gap, var; tree ni_minus_gap, var;
tree niters_vector, step_vector, type = TREE_TYPE (niters); tree niters_vector, step_vector, type = TREE_TYPE (niters);
int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo)); edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
tree log_vf = NULL_TREE; tree log_vf = NULL_TREE;
...@@ -1315,14 +1318,15 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters, ...@@ -1315,14 +1318,15 @@ vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters,
else else
ni_minus_gap = niters; ni_minus_gap = niters;
if (1) unsigned HOST_WIDE_INT const_vf;
if (vf.is_constant (&const_vf))
{ {
/* Create: niters >> log2(vf) */ /* Create: niters >> log2(vf) */
/* If it's known that niters == number of latch executions + 1 doesn't /* If it's known that niters == number of latch executions + 1 doesn't
overflow, we can generate niters >> log2(vf); otherwise we generate overflow, we can generate niters >> log2(vf); otherwise we generate
(niters - vf) >> log2(vf) + 1 by using the fact that we know ratio (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio
will be at least one. */ will be at least one. */
log_vf = build_int_cst (type, exact_log2 (vf)); log_vf = build_int_cst (type, exact_log2 (const_vf));
if (niters_no_overflow) if (niters_no_overflow)
niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf); niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf);
else else
...@@ -1373,7 +1377,8 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo, ...@@ -1373,7 +1377,8 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
tree niters_vector, tree niters_vector,
tree *niters_vector_mult_vf_ptr) tree *niters_vector_mult_vf_ptr)
{ {
int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); /* We should be using a step_vector of VF if VF is variable. */
int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree type = TREE_TYPE (niters_vector); tree type = TREE_TYPE (niters_vector);
tree log_vf = build_int_cst (type, exact_log2 (vf)); tree log_vf = build_int_cst (type, exact_log2 (vf));
...@@ -1790,8 +1795,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, ...@@ -1790,8 +1795,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
tree type = TREE_TYPE (niters), guard_cond; tree type = TREE_TYPE (niters), guard_cond;
basic_block guard_bb, guard_to; basic_block guard_bb, guard_to;
profile_probability prob_prolog, prob_vector, prob_epilog; profile_probability prob_prolog, prob_vector, prob_epilog;
int bound_prolog = 0, bound_scalar = 0, bound = 0; int bound_prolog = 0;
int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); poly_uint64 bound_scalar = 0;
int estimated_vf;
int prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); int prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
bool epilog_peeling = (LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) bool epilog_peeling = (LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
|| LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)); || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
...@@ -1800,11 +1806,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, ...@@ -1800,11 +1806,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
return NULL; return NULL;
prob_vector = profile_probability::guessed_always ().apply_scale (9, 10); prob_vector = profile_probability::guessed_always ().apply_scale (9, 10);
if ((vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo)) == 2) estimated_vf = vect_vf_for_cost (loop_vinfo);
vf = 3; if (estimated_vf == 2)
estimated_vf = 3;
prob_prolog = prob_epilog = profile_probability::guessed_always () prob_prolog = prob_epilog = profile_probability::guessed_always ()
.apply_scale (vf - 1, vf); .apply_scale (estimated_vf - 1, estimated_vf);
vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
struct loop *prolog, *epilog = NULL, *loop = LOOP_VINFO_LOOP (loop_vinfo); struct loop *prolog, *epilog = NULL, *loop = LOOP_VINFO_LOOP (loop_vinfo);
struct loop *first_loop = loop; struct loop *first_loop = loop;
...@@ -1824,13 +1831,15 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, ...@@ -1824,13 +1831,15 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
/* Skip to epilog if scalar loop may be preferred. It's only needed /* Skip to epilog if scalar loop may be preferred. It's only needed
when we peel for epilog loop and when it hasn't been checked with when we peel for epilog loop and when it hasn't been checked with
loop versioning. */ loop versioning. */
bool skip_vector = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) bool skip_vector = ((!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
&& !LOOP_REQUIRES_VERSIONING (loop_vinfo)); && !LOOP_REQUIRES_VERSIONING (loop_vinfo))
|| !vf.is_constant ());
/* Epilog loop must be executed if the number of iterations for epilog /* Epilog loop must be executed if the number of iterations for epilog
loop is known at compile time, otherwise we need to add a check at loop is known at compile time, otherwise we need to add a check at
the end of vector loop and skip to the end of epilog loop. */ the end of vector loop and skip to the end of epilog loop. */
bool skip_epilog = (prolog_peeling < 0 bool skip_epilog = (prolog_peeling < 0
|| !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)); || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
|| !vf.is_constant ());
/* PEELING_FOR_GAPS is special because epilog loop must be executed. */ /* PEELING_FOR_GAPS is special because epilog loop must be executed. */
if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
skip_epilog = false; skip_epilog = false;
...@@ -1849,8 +1858,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, ...@@ -1849,8 +1858,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
needs to be scaled back later. */ needs to be scaled back later. */
basic_block bb_before_loop = loop_preheader_edge (loop)->src; basic_block bb_before_loop = loop_preheader_edge (loop)->src;
if (prob_vector.initialized_p ()) if (prob_vector.initialized_p ())
scale_bbs_frequencies (&bb_before_loop, 1, prob_vector); {
scale_loop_profile (loop, prob_vector, bound); scale_bbs_frequencies (&bb_before_loop, 1, prob_vector);
scale_loop_profile (loop, prob_vector, 0);
}
} }
tree niters_prolog = build_int_cst (type, 0); tree niters_prolog = build_int_cst (type, 0);
...@@ -2036,15 +2047,20 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, ...@@ -2036,15 +2047,20 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
scale_bbs_frequencies (&bb_before_epilog, 1, prob_epilog); scale_bbs_frequencies (&bb_before_epilog, 1, prob_epilog);
} }
scale_loop_profile (epilog, prob_epilog, bound); scale_loop_profile (epilog, prob_epilog, 0);
} }
else else
slpeel_update_phi_nodes_for_lcssa (epilog); slpeel_update_phi_nodes_for_lcssa (epilog);
bound = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? vf - 1 : vf - 2; unsigned HOST_WIDE_INT bound1, bound2;
/* We share epilog loop with scalar version loop. */ if (vf.is_constant (&bound1) && bound_scalar.is_constant (&bound2))
bound = MAX (bound, bound_scalar - 1); {
record_niter_bound (epilog, bound, false, true); bound1 -= LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 2;
if (bound2)
/* We share epilog loop with scalar version loop. */
bound1 = MAX (bound1, bound2 - 1);
record_niter_bound (epilog, bound1, false, true);
}
delete_update_ssa (); delete_update_ssa ();
adjust_vec_debug_stmts (); adjust_vec_debug_stmts ();
......
...@@ -1505,14 +1505,14 @@ vect_attempt_slp_rearrange_stmts (slp_instance slp_instn) ...@@ -1505,14 +1505,14 @@ vect_attempt_slp_rearrange_stmts (slp_instance slp_instn)
node->load_permutation); node->load_permutation);
/* We are done, no actual permutations need to be generated. */ /* We are done, no actual permutations need to be generated. */
unsigned int unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (slp_instn); poly_uint64 unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (slp_instn);
FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node) FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
{ {
gimple *first_stmt = SLP_TREE_SCALAR_STMTS (node)[0]; gimple *first_stmt = SLP_TREE_SCALAR_STMTS (node)[0];
first_stmt = GROUP_FIRST_ELEMENT (vinfo_for_stmt (first_stmt)); first_stmt = GROUP_FIRST_ELEMENT (vinfo_for_stmt (first_stmt));
/* But we have to keep those permutations that are required because /* But we have to keep those permutations that are required because
of handling of gaps. */ of handling of gaps. */
if (unrolling_factor == 1 if (known_eq (unrolling_factor, 1U)
|| (group_size == GROUP_SIZE (vinfo_for_stmt (first_stmt)) || (group_size == GROUP_SIZE (vinfo_for_stmt (first_stmt))
&& GROUP_GAP (vinfo_for_stmt (first_stmt)) == 0)) && GROUP_GAP (vinfo_for_stmt (first_stmt)) == 0))
SLP_TREE_LOAD_PERMUTATION (node).release (); SLP_TREE_LOAD_PERMUTATION (node).release ();
...@@ -1639,10 +1639,10 @@ vect_supported_load_permutation_p (slp_instance slp_instn) ...@@ -1639,10 +1639,10 @@ vect_supported_load_permutation_p (slp_instance slp_instn)
and the vectorization factor is not yet final. and the vectorization factor is not yet final.
??? The SLP instance unrolling factor might not be the maximum one. */ ??? The SLP instance unrolling factor might not be the maximum one. */
unsigned n_perms; unsigned n_perms;
unsigned test_vf poly_uint64 test_vf
= least_common_multiple (SLP_INSTANCE_UNROLLING_FACTOR (slp_instn), = force_common_multiple (SLP_INSTANCE_UNROLLING_FACTOR (slp_instn),
LOOP_VINFO_VECT_FACTOR LOOP_VINFO_VECT_FACTOR
(STMT_VINFO_LOOP_VINFO (vinfo_for_stmt (stmt)))); (STMT_VINFO_LOOP_VINFO (vinfo_for_stmt (stmt))));
FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node) FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
if (node->load_permutation.exists () if (node->load_permutation.exists ()
&& !vect_transform_slp_perm_load (node, vNULL, NULL, test_vf, && !vect_transform_slp_perm_load (node, vNULL, NULL, test_vf,
...@@ -1755,7 +1755,8 @@ vect_analyze_slp_cost_1 (slp_instance instance, slp_tree node, ...@@ -1755,7 +1755,8 @@ vect_analyze_slp_cost_1 (slp_instance instance, slp_tree node,
gcc_assert (ncopies_for_cost gcc_assert (ncopies_for_cost
<= (GROUP_SIZE (stmt_info) - GROUP_GAP (stmt_info) <= (GROUP_SIZE (stmt_info) - GROUP_GAP (stmt_info)
+ nunits - 1) / nunits); + nunits - 1) / nunits);
ncopies_for_cost *= SLP_INSTANCE_UNROLLING_FACTOR (instance); poly_uint64 uf = SLP_INSTANCE_UNROLLING_FACTOR (instance);
ncopies_for_cost *= estimated_poly_value (uf);
} }
/* Record the cost for the vector loads. */ /* Record the cost for the vector loads. */
vect_model_load_cost (stmt_info, ncopies_for_cost, vect_model_load_cost (stmt_info, ncopies_for_cost,
...@@ -1859,10 +1860,13 @@ vect_analyze_slp_cost (slp_instance instance, void *data) ...@@ -1859,10 +1860,13 @@ vect_analyze_slp_cost (slp_instance instance, void *data)
unsigned group_size = SLP_INSTANCE_GROUP_SIZE (instance); unsigned group_size = SLP_INSTANCE_GROUP_SIZE (instance);
slp_tree node = SLP_INSTANCE_TREE (instance); slp_tree node = SLP_INSTANCE_TREE (instance);
stmt_vec_info stmt_info = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]); stmt_vec_info stmt_info = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]);
/* Adjust the group_size by the vectorization factor which is always one /* Get the estimated vectorization factor, which is always one for
for basic-block vectorization. */ basic-block vectorization. */
unsigned int assumed_vf;
if (STMT_VINFO_LOOP_VINFO (stmt_info)) if (STMT_VINFO_LOOP_VINFO (stmt_info))
group_size *= LOOP_VINFO_VECT_FACTOR (STMT_VINFO_LOOP_VINFO (stmt_info)); assumed_vf = vect_vf_for_cost (STMT_VINFO_LOOP_VINFO (stmt_info));
else
assumed_vf = 1;
unsigned nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)); unsigned nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
/* For reductions look at a reduction operand in case the reduction /* For reductions look at a reduction operand in case the reduction
operation is widening like DOT_PROD or SAD. */ operation is widening like DOT_PROD or SAD. */
...@@ -1879,7 +1883,8 @@ vect_analyze_slp_cost (slp_instance instance, void *data) ...@@ -1879,7 +1883,8 @@ vect_analyze_slp_cost (slp_instance instance, void *data)
default:; default:;
} }
} }
ncopies_for_cost = least_common_multiple (nunits, group_size) / nunits; ncopies_for_cost = least_common_multiple (nunits,
group_size * assumed_vf) / nunits;
prologue_cost_vec.create (10); prologue_cost_vec.create (10);
body_cost_vec.create (10); body_cost_vec.create (10);
...@@ -1971,7 +1976,7 @@ vect_analyze_slp_instance (vec_info *vinfo, ...@@ -1971,7 +1976,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
slp_instance new_instance; slp_instance new_instance;
slp_tree node; slp_tree node;
unsigned int group_size = GROUP_SIZE (vinfo_for_stmt (stmt)); unsigned int group_size = GROUP_SIZE (vinfo_for_stmt (stmt));
unsigned int unrolling_factor = 1, nunits; unsigned int nunits;
tree vectype, scalar_type = NULL_TREE; tree vectype, scalar_type = NULL_TREE;
gimple *next; gimple *next;
unsigned int i; unsigned int i;
...@@ -2059,10 +2064,10 @@ vect_analyze_slp_instance (vec_info *vinfo, ...@@ -2059,10 +2064,10 @@ vect_analyze_slp_instance (vec_info *vinfo,
if (node != NULL) if (node != NULL)
{ {
/* Calculate the unrolling factor based on the smallest type. */ /* Calculate the unrolling factor based on the smallest type. */
unrolling_factor poly_uint64 unrolling_factor
= least_common_multiple (max_nunits, group_size) / group_size; = least_common_multiple (max_nunits, group_size) / group_size;
if (unrolling_factor != 1 if (maybe_ne (unrolling_factor, 1U)
&& is_a <bb_vec_info> (vinfo)) && is_a <bb_vec_info> (vinfo))
{ {
...@@ -2115,7 +2120,7 @@ vect_analyze_slp_instance (vec_info *vinfo, ...@@ -2115,7 +2120,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
/* The load requires permutation when unrolling exposes /* The load requires permutation when unrolling exposes
a gap either because the group is larger than the SLP a gap either because the group is larger than the SLP
group-size or because there is a gap between the groups. */ group-size or because there is a gap between the groups. */
&& (unrolling_factor == 1 && (known_eq (unrolling_factor, 1U)
|| (group_size == GROUP_SIZE (vinfo_for_stmt (first_stmt)) || (group_size == GROUP_SIZE (vinfo_for_stmt (first_stmt))
&& GROUP_GAP (vinfo_for_stmt (first_stmt)) == 0))) && GROUP_GAP (vinfo_for_stmt (first_stmt)) == 0)))
{ {
...@@ -2290,7 +2295,8 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) ...@@ -2290,7 +2295,8 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
bool bool
vect_make_slp_decision (loop_vec_info loop_vinfo) vect_make_slp_decision (loop_vec_info loop_vinfo)
{ {
unsigned int i, unrolling_factor = 1; unsigned int i;
poly_uint64 unrolling_factor = 1;
vec<slp_instance> slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo); vec<slp_instance> slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
slp_instance instance; slp_instance instance;
int decided_to_slp = 0; int decided_to_slp = 0;
...@@ -2302,8 +2308,11 @@ vect_make_slp_decision (loop_vec_info loop_vinfo) ...@@ -2302,8 +2308,11 @@ vect_make_slp_decision (loop_vec_info loop_vinfo)
FOR_EACH_VEC_ELT (slp_instances, i, instance) FOR_EACH_VEC_ELT (slp_instances, i, instance)
{ {
/* FORNOW: SLP if you can. */ /* FORNOW: SLP if you can. */
if (unrolling_factor < SLP_INSTANCE_UNROLLING_FACTOR (instance)) /* All unroll factors have the form current_vector_size * X for some
unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (instance); rational X, so they must have a common multiple. */
unrolling_factor
= force_common_multiple (unrolling_factor,
SLP_INSTANCE_UNROLLING_FACTOR (instance));
/* Mark all the stmts that belong to INSTANCE as PURE_SLP stmts. Later we /* Mark all the stmts that belong to INSTANCE as PURE_SLP stmts. Later we
call vect_detect_hybrid_slp () to find stmts that need hybrid SLP and call vect_detect_hybrid_slp () to find stmts that need hybrid SLP and
...@@ -2315,9 +2324,13 @@ vect_make_slp_decision (loop_vec_info loop_vinfo) ...@@ -2315,9 +2324,13 @@ vect_make_slp_decision (loop_vec_info loop_vinfo)
LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo) = unrolling_factor; LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo) = unrolling_factor;
if (decided_to_slp && dump_enabled_p ()) if (decided_to_slp && dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, {
"Decided to SLP %d instances. Unrolling factor %d\n", dump_printf_loc (MSG_NOTE, vect_location,
decided_to_slp, unrolling_factor); "Decided to SLP %d instances. Unrolling factor ",
decided_to_slp);
dump_dec (MSG_NOTE, unrolling_factor);
dump_printf (MSG_NOTE, "\n");
}
return (decided_to_slp > 0); return (decided_to_slp > 0);
} }
...@@ -2627,7 +2640,7 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node, ...@@ -2627,7 +2640,7 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
= SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_CHILDREN (node)[0]); = SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_CHILDREN (node)[0]);
else else
{ {
int vf; poly_uint64 vf;
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
vf = loop_vinfo->vectorization_factor; vf = loop_vinfo->vectorization_factor;
else else
...@@ -2635,7 +2648,7 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node, ...@@ -2635,7 +2648,7 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
unsigned int group_size = SLP_INSTANCE_GROUP_SIZE (node_instance); unsigned int group_size = SLP_INSTANCE_GROUP_SIZE (node_instance);
tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree vectype = STMT_VINFO_VECTYPE (stmt_info);
SLP_TREE_NUMBER_OF_VEC_STMTS (node) SLP_TREE_NUMBER_OF_VEC_STMTS (node)
= vf * group_size / TYPE_VECTOR_SUBPARTS (vectype); = vect_get_num_vectors (vf * group_size, vectype);
} }
/* Push SLP node def-type to stmt operands. */ /* Push SLP node def-type to stmt operands. */
...@@ -2841,7 +2854,7 @@ vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, ...@@ -2841,7 +2854,7 @@ vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
bb_vec_info bb_vinfo; bb_vec_info bb_vinfo;
slp_instance instance; slp_instance instance;
int i; int i;
int min_vf = 2; poly_uint64 min_vf = 2;
/* The first group of checks is independent of the vector size. */ /* The first group of checks is independent of the vector size. */
fatal = true; fatal = true;
...@@ -3545,8 +3558,8 @@ vect_get_slp_defs (vec<tree> ops, slp_tree slp_node, ...@@ -3545,8 +3558,8 @@ vect_get_slp_defs (vec<tree> ops, slp_tree slp_node,
bool bool
vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain, vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain,
gimple_stmt_iterator *gsi, int vf, gimple_stmt_iterator *gsi, poly_uint64 vf,
slp_instance slp_node_instance, bool analyze_only, slp_instance slp_node_instance, bool analyze_only,
unsigned *n_perms) unsigned *n_perms)
{ {
gimple *stmt = SLP_TREE_SCALAR_STMTS (node)[0]; gimple *stmt = SLP_TREE_SCALAR_STMTS (node)[0];
...@@ -3557,6 +3570,7 @@ vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain, ...@@ -3557,6 +3570,7 @@ vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain,
int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance); int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
int mask_element; int mask_element;
machine_mode mode; machine_mode mode;
unsigned HOST_WIDE_INT const_vf;
if (!STMT_VINFO_GROUPED_ACCESS (stmt_info)) if (!STMT_VINFO_GROUPED_ACCESS (stmt_info))
return false; return false;
...@@ -3565,6 +3579,11 @@ vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain, ...@@ -3565,6 +3579,11 @@ vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain,
mode = TYPE_MODE (vectype); mode = TYPE_MODE (vectype);
/* At the moment, all permutations are represented using per-element
indices, so we can't cope with variable vectorization factors. */
if (!vf.is_constant (&const_vf))
return false;
/* The generic VEC_PERM_EXPR code always uses an integral type of the /* The generic VEC_PERM_EXPR code always uses an integral type of the
same size as the vector element being permuted. */ same size as the vector element being permuted. */
mask_element_type = lang_hooks.types.type_for_mode mask_element_type = lang_hooks.types.type_for_mode
...@@ -3607,7 +3626,7 @@ vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain, ...@@ -3607,7 +3626,7 @@ vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain,
bool noop_p = true; bool noop_p = true;
*n_perms = 0; *n_perms = 0;
for (int j = 0; j < vf; j++) for (unsigned int j = 0; j < const_vf; j++)
{ {
for (int k = 0; k < group_size; k++) for (int k = 0; k < group_size; k++)
{ {
......
...@@ -3361,6 +3361,16 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -3361,6 +3361,16 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi,
arginfo.quick_push (thisarginfo); arginfo.quick_push (thisarginfo);
} }
unsigned HOST_WIDE_INT vf;
if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not considering SIMD clones; not yet supported"
" for variable-width vectors.\n");
return NULL;
}
unsigned int badness = 0; unsigned int badness = 0;
struct cgraph_node *bestn = NULL; struct cgraph_node *bestn = NULL;
if (STMT_VINFO_SIMD_CLONE_INFO (stmt_info).exists ()) if (STMT_VINFO_SIMD_CLONE_INFO (stmt_info).exists ())
...@@ -3370,13 +3380,11 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -3370,13 +3380,11 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi,
n = n->simdclone->next_clone) n = n->simdclone->next_clone)
{ {
unsigned int this_badness = 0; unsigned int this_badness = 0;
if (n->simdclone->simdlen if (n->simdclone->simdlen > vf
> (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo)
|| n->simdclone->nargs != nargs) || n->simdclone->nargs != nargs)
continue; continue;
if (n->simdclone->simdlen if (n->simdclone->simdlen < vf)
< (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo)) this_badness += (exact_log2 (vf)
this_badness += (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
- exact_log2 (n->simdclone->simdlen)) * 1024; - exact_log2 (n->simdclone->simdlen)) * 1024;
if (n->simdclone->inbranch) if (n->simdclone->inbranch)
this_badness += 2048; this_badness += 2048;
...@@ -3465,7 +3473,7 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi, ...@@ -3465,7 +3473,7 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi,
fndecl = bestn->decl; fndecl = bestn->decl;
nunits = bestn->simdclone->simdlen; nunits = bestn->simdclone->simdlen;
ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; ncopies = vf / nunits;
/* If the function isn't const, only allow it in simd loops where user /* If the function isn't const, only allow it in simd loops where user
has asserted that at least nunits consecutive iterations can be has asserted that at least nunits consecutive iterations can be
...@@ -5694,7 +5702,7 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -5694,7 +5702,7 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
gather_scatter_info gs_info; gather_scatter_info gs_info;
enum vect_def_type scatter_src_dt = vect_unknown_def_type; enum vect_def_type scatter_src_dt = vect_unknown_def_type;
gimple *new_stmt; gimple *new_stmt;
int vf; poly_uint64 vf;
vec_load_store_type vls_type; vec_load_store_type vls_type;
tree ref_type; tree ref_type;
...@@ -6664,7 +6672,8 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -6664,7 +6672,8 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
tree dataref_offset = NULL_TREE; tree dataref_offset = NULL_TREE;
gimple *ptr_incr = NULL; gimple *ptr_incr = NULL;
int ncopies; int ncopies;
int i, j, group_size, group_gap_adj; int i, j, group_size;
poly_int64 group_gap_adj;
tree msq = NULL_TREE, lsq; tree msq = NULL_TREE, lsq;
tree offset = NULL_TREE; tree offset = NULL_TREE;
tree byte_offset = NULL_TREE; tree byte_offset = NULL_TREE;
...@@ -6682,7 +6691,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -6682,7 +6691,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
bool slp_perm = false; bool slp_perm = false;
enum tree_code code; enum tree_code code;
bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
int vf; poly_uint64 vf;
tree aggr_type; tree aggr_type;
gather_scatter_info gs_info; gather_scatter_info gs_info;
vec_info *vinfo = stmt_info->vinfo; vec_info *vinfo = stmt_info->vinfo;
...@@ -6752,8 +6761,8 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -6752,8 +6761,8 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
on the unrolled body effectively re-orders stmts. */ on the unrolled body effectively re-orders stmts. */
if (ncopies > 1 if (ncopies > 1
&& STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0 && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
&& ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo) && maybe_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> STMT_VINFO_MIN_NEG_DIST (stmt_info))) STMT_VINFO_MIN_NEG_DIST (stmt_info)))
{ {
if (dump_enabled_p ()) if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
...@@ -6793,8 +6802,8 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -6793,8 +6802,8 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
on the unrolled body effectively re-orders stmts. */ on the unrolled body effectively re-orders stmts. */
if (!PURE_SLP_STMT (stmt_info) if (!PURE_SLP_STMT (stmt_info)
&& STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0 && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
&& ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo) && maybe_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> STMT_VINFO_MIN_NEG_DIST (stmt_info))) STMT_VINFO_MIN_NEG_DIST (stmt_info)))
{ {
if (dump_enabled_p ()) if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
...@@ -7156,7 +7165,10 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -7156,7 +7165,10 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
fits in. */ fits in. */
if (slp_perm) if (slp_perm)
{ {
ncopies = (group_size * vf + nunits - 1) / nunits; /* We don't yet generate SLP_TREE_LOAD_PERMUTATIONs for
variable VF. */
unsigned int const_vf = vf.to_constant ();
ncopies = (group_size * const_vf + nunits - 1) / nunits;
dr_chain.create (ncopies); dr_chain.create (ncopies);
} }
else else
...@@ -7274,7 +7286,10 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -7274,7 +7286,10 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
fits in. */ fits in. */
if (slp_perm) if (slp_perm)
{ {
vec_num = (group_size * vf + nunits - 1) / nunits; /* We don't yet generate SLP_TREE_LOAD_PERMUTATIONs for
variable VF. */
unsigned int const_vf = vf.to_constant ();
vec_num = (group_size * const_vf + nunits - 1) / nunits;
group_gap_adj = vf * group_size - nunits * vec_num; group_gap_adj = vf * group_size - nunits * vec_num;
} }
else else
...@@ -7740,11 +7755,13 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -7740,11 +7755,13 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
we need to skip the gaps after we manage to fully load we need to skip the gaps after we manage to fully load
all elements. group_gap_adj is GROUP_SIZE here. */ all elements. group_gap_adj is GROUP_SIZE here. */
group_elt += nunits; group_elt += nunits;
if (group_gap_adj != 0 && ! slp_perm if (maybe_ne (group_gap_adj, 0U)
&& group_elt == group_size - group_gap_adj) && !slp_perm
&& known_eq (group_elt, group_size - group_gap_adj))
{ {
wide_int bump_val = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) poly_wide_int bump_val
* group_gap_adj); = (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
* group_gap_adj);
tree bump = wide_int_to_tree (sizetype, bump_val); tree bump = wide_int_to_tree (sizetype, bump_val);
dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
stmt, bump); stmt, bump);
...@@ -7753,10 +7770,11 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, ...@@ -7753,10 +7770,11 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
} }
/* Bump the vector pointer to account for a gap or for excess /* Bump the vector pointer to account for a gap or for excess
elements loaded for a permuted SLP load. */ elements loaded for a permuted SLP load. */
if (group_gap_adj != 0 && slp_perm) if (maybe_ne (group_gap_adj, 0U) && slp_perm)
{ {
wide_int bump_val = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) poly_wide_int bump_val
* group_gap_adj); = (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
* group_gap_adj);
tree bump = wide_int_to_tree (sizetype, bump_val); tree bump = wide_int_to_tree (sizetype, bump_val);
dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
stmt, bump); stmt, bump);
......
...@@ -91,7 +91,7 @@ vec<stmt_vec_info> stmt_vec_info_vec; ...@@ -91,7 +91,7 @@ vec<stmt_vec_info> stmt_vec_info_vec;
struct simduid_to_vf : free_ptr_hash<simduid_to_vf> struct simduid_to_vf : free_ptr_hash<simduid_to_vf>
{ {
unsigned int simduid; unsigned int simduid;
int vf; poly_uint64 vf;
/* hash_table support. */ /* hash_table support. */
static inline hashval_t hash (const simduid_to_vf *); static inline hashval_t hash (const simduid_to_vf *);
...@@ -161,7 +161,7 @@ adjust_simduid_builtins (hash_table<simduid_to_vf> *htab) ...@@ -161,7 +161,7 @@ adjust_simduid_builtins (hash_table<simduid_to_vf> *htab)
for (i = gsi_start_bb (bb); !gsi_end_p (i); ) for (i = gsi_start_bb (bb); !gsi_end_p (i); )
{ {
unsigned int vf = 1; poly_uint64 vf = 1;
enum internal_fn ifn; enum internal_fn ifn;
gimple *stmt = gsi_stmt (i); gimple *stmt = gsi_stmt (i);
tree t; tree t;
...@@ -338,7 +338,7 @@ shrink_simd_arrays ...@@ -338,7 +338,7 @@ shrink_simd_arrays
if ((*iter)->simduid != -1U) if ((*iter)->simduid != -1U)
{ {
tree decl = (*iter)->decl; tree decl = (*iter)->decl;
int vf = 1; poly_uint64 vf = 1;
if (simduid_to_vf_htab) if (simduid_to_vf_htab)
{ {
simduid_to_vf *p = NULL, data; simduid_to_vf *p = NULL, data;
......
...@@ -129,7 +129,7 @@ typedef struct _slp_instance { ...@@ -129,7 +129,7 @@ typedef struct _slp_instance {
unsigned int group_size; unsigned int group_size;
/* The unrolling factor required to vectorized this SLP instance. */ /* The unrolling factor required to vectorized this SLP instance. */
unsigned int unrolling_factor; poly_uint64 unrolling_factor;
/* The group of nodes that contain loads of this SLP instance. */ /* The group of nodes that contain loads of this SLP instance. */
vec<slp_tree> loads; vec<slp_tree> loads;
...@@ -245,7 +245,7 @@ typedef struct _loop_vec_info : public vec_info { ...@@ -245,7 +245,7 @@ typedef struct _loop_vec_info : public vec_info {
poly_uint64 versioning_threshold; poly_uint64 versioning_threshold;
/* Unrolling factor */ /* Unrolling factor */
int vectorization_factor; poly_uint64 vectorization_factor;
/* Maximum runtime vectorization factor, or MAX_VECTORIZATION_FACTOR /* Maximum runtime vectorization factor, or MAX_VECTORIZATION_FACTOR
if there is no particular limit. */ if there is no particular limit. */
...@@ -297,7 +297,7 @@ typedef struct _loop_vec_info : public vec_info { ...@@ -297,7 +297,7 @@ typedef struct _loop_vec_info : public vec_info {
/* The unrolling factor needed to SLP the loop. In case of that pure SLP is /* The unrolling factor needed to SLP the loop. In case of that pure SLP is
applied to the loop, i.e., no unrolling is needed, this is 1. */ applied to the loop, i.e., no unrolling is needed, this is 1. */
unsigned slp_unrolling_factor; poly_uint64 slp_unrolling_factor;
/* Cost of a single scalar iteration. */ /* Cost of a single scalar iteration. */
int single_scalar_iteration_cost; int single_scalar_iteration_cost;
...@@ -815,8 +815,7 @@ struct dataref_aux { ...@@ -815,8 +815,7 @@ struct dataref_aux {
conversion. */ conversion. */
#define MAX_INTERM_CVT_STEPS 3 #define MAX_INTERM_CVT_STEPS 3
/* The maximum vectorization factor supported by any target (V64QI). */ #define MAX_VECTORIZATION_FACTOR INT_MAX
#define MAX_VECTORIZATION_FACTOR 64
/* Nonzero if TYPE represents a (scalar) boolean type or type /* Nonzero if TYPE represents a (scalar) boolean type or type
in the middle-end compatible with it (unsigned precision 1 integral in the middle-end compatible with it (unsigned precision 1 integral
...@@ -1109,6 +1108,16 @@ unlimited_cost_model (loop_p loop) ...@@ -1109,6 +1108,16 @@ unlimited_cost_model (loop_p loop)
return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED); return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED);
} }
/* Return the number of vectors of type VECTYPE that are needed to get
NUNITS elements. NUNITS should be based on the vectorization factor,
so it is always a known multiple of the number of elements in VECTYPE. */
static inline unsigned int
vect_get_num_vectors (poly_uint64 nunits, tree vectype)
{
return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
}
/* Return the number of copies needed for loop vectorization when /* Return the number of copies needed for loop vectorization when
a statement operates on vectors of type VECTYPE. This is the a statement operates on vectors of type VECTYPE. This is the
vectorization factor divided by the number of elements in vectorization factor divided by the number of elements in
...@@ -1117,10 +1126,32 @@ unlimited_cost_model (loop_p loop) ...@@ -1117,10 +1126,32 @@ unlimited_cost_model (loop_p loop)
static inline unsigned int static inline unsigned int
vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype) vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
{ {
gcc_checking_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo) return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR (loop_vinfo), vectype);
% TYPE_VECTOR_SUBPARTS (vectype) == 0); }
return (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
/ TYPE_VECTOR_SUBPARTS (vectype)); /* Update maximum unit count *MAX_NUNITS so that it accounts for
the number of units in vector type VECTYPE. *MAX_NUNITS can be 1
if we haven't yet recorded any vector types. */
static inline void
vect_update_max_nunits (poly_uint64 *max_nunits, tree vectype)
{
/* All unit counts have the form current_vector_size * X for some
rational X, so two unit sizes must have a common multiple.
Everything is a multiple of the initial value of 1. */
poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
*max_nunits = force_common_multiple (*max_nunits, nunits);
}
/* Return the vectorization factor that should be used for costing
purposes while vectorizing the loop described by LOOP_VINFO.
Pick a reasonable estimate if the vectorization factor isn't
known at compile time. */
static inline unsigned int
vect_vf_for_cost (loop_vec_info loop_vinfo)
{
return estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
} }
/* Return the size of the value accessed by unvectorized data reference DR. /* Return the size of the value accessed by unvectorized data reference DR.
...@@ -1223,7 +1254,7 @@ extern enum dr_alignment_support vect_supportable_dr_alignment ...@@ -1223,7 +1254,7 @@ extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *, bool); (struct data_reference *, bool);
extern tree vect_get_smallest_scalar_type (gimple *, HOST_WIDE_INT *, extern tree vect_get_smallest_scalar_type (gimple *, HOST_WIDE_INT *,
HOST_WIDE_INT *); HOST_WIDE_INT *);
extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *); extern bool vect_analyze_data_ref_dependences (loop_vec_info, unsigned int *);
extern bool vect_slp_analyze_instance_dependence (slp_instance); extern bool vect_slp_analyze_instance_dependence (slp_instance);
extern bool vect_enhance_data_refs_alignment (loop_vec_info); extern bool vect_enhance_data_refs_alignment (loop_vec_info);
extern bool vect_analyze_data_refs_alignment (loop_vec_info); extern bool vect_analyze_data_refs_alignment (loop_vec_info);
...@@ -1233,7 +1264,7 @@ extern bool vect_analyze_data_ref_accesses (vec_info *); ...@@ -1233,7 +1264,7 @@ extern bool vect_analyze_data_ref_accesses (vec_info *);
extern bool vect_prune_runtime_alias_test_list (loop_vec_info); extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
extern bool vect_check_gather_scatter (gimple *, loop_vec_info, extern bool vect_check_gather_scatter (gimple *, loop_vec_info,
gather_scatter_info *); gather_scatter_info *);
extern bool vect_analyze_data_refs (vec_info *, int *); extern bool vect_analyze_data_refs (vec_info *, poly_uint64 *);
extern void vect_record_base_alignments (vec_info *); extern void vect_record_base_alignments (vec_info *);
extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree, extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree,
tree *, gimple_stmt_iterator *, tree *, gimple_stmt_iterator *,
...@@ -1291,8 +1322,8 @@ extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, ...@@ -1291,8 +1322,8 @@ extern int vect_get_known_peeling_cost (loop_vec_info, int, int *,
/* In tree-vect-slp.c. */ /* In tree-vect-slp.c. */
extern void vect_free_slp_instance (slp_instance); extern void vect_free_slp_instance (slp_instance);
extern bool vect_transform_slp_perm_load (slp_tree, vec<tree> , extern bool vect_transform_slp_perm_load (slp_tree, vec<tree> ,
gimple_stmt_iterator *, int, gimple_stmt_iterator *, poly_uint64,
slp_instance, bool, unsigned *); slp_instance, bool, unsigned *);
extern bool vect_slp_analyze_operations (vec_info *); extern bool vect_slp_analyze_operations (vec_info *);
extern bool vect_schedule_slp (vec_info *); extern bool vect_schedule_slp (vec_info *);
extern bool vect_analyze_slp (vec_info *, unsigned); extern bool vect_analyze_slp (vec_info *, unsigned);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment