Commit 805e2059 by Ira Rosen Committed by Ira Rosen

tree-vectorizer.h (enum vect_def_type): Start enumeration from 1.

	* tree-vectorizer.h (enum vect_def_type): Start enumeration from 1.
	(struct _slp_tree, struct _slp_instance): Define new data structures
	along macros for their access.
	(struct _loop_vec_info): Define new fields: strided_stores,
	slp_instances, and slp_unrolling_factor along macros for their access.
	(enum slp_vect_type): New.
	(struct _stmt_vec_info): Define new field, slp_type, and macros for its
	access.
	(STMT_VINFO_STRIDED_ACCESS): New macro.
	(vect_free_slp_tree): Declare.
	(vectorizable_load): Add an argument of type slp_tree.
	(vectorizable_store, vectorizable_operation, vectorizable_conversion,
	vectorizable_assignment): Likewise.
	(vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
	Declare (make extern).
	* tree-vectorizer.c (new_stmt_vec_info): Initiliaze the new field.
	(new_loop_vec_info): Likewise.
	(destroy_loop_vec_info): Free memory allocated for SLP structures.
	* tree-vect-analyze.c: Include recog.h.
	(vect_update_slp_costs_according_to_vf): New.
	(vect_analyze_operations): Add argument for calls to vectorizable_ ()
	functions. For not pure SLP stmts with strided access check that the
	group size is power of 2. Update the vectorization factor according to
	SLP. Call vect_update_slp_costs_according_to_vf.
	(vect_analyze_group_access): New.
	(vect_analyze_data_ref_access): Call vect_analyze_group_access.
	(vect_free_slp_tree): New functions.
	(vect_get_and_check_slp_defs, vect_build_slp_tree, vect_print_slp_tree,
	vect_mark_slp_stmts, vect_analyze_slp_instance, vect_analyze_slp,
	vect_make_slp_decision, vect_detect_hybrid_slp_stmts,
	vect_detect_hybrid_slp): Likewise.
	(vect_analyze_loop): Call vect_analyze_slp, vect_make_slp_decision
	and vect_detect_hybrid_slp.
	* tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
	SLP costs into account.
	(vect_get_cost_fields): New function.
	(vect_model_simple_cost): Make extern, add SLP parameter and handle
	SLP.
	(vect_model_store_cost, vect_model_load_cost): Likewise.
	(vect_get_constant_vectors): New function.
	(vect_get_slp_vect_defs, vect_get_slp_defs,
	vect_get_vec_defs_for_stmt_copy, vect_get_vec_defs_for_stmt_copy,
	vect_get_vec_defs): Likewise.
	(vectorizable_reduction): Don't handle SLP for now.
	(vectorizable_call): Don't handle SLP for now. Add argument to
	vect_model_simple_cost.
	(vectorizable_conversion): Handle SLP (call vect_get_vec_defs to
	get SLPed and vectorized defs). Fix indentation and spacing.
	(vectorizable_assignment): Handle SLP.
	(vectorizable_induction): Don't handle SLP for now.
	(vectorizable_operation): Likewise.
	(vectorizable_type_demotion): Add argument to
	vect_model_simple_cost.
	(vectorizable_type_promotion): Likewise.
	(vectorizable_store, vectorizable_load): Handle SLP.
	(vectorizable_condition): Don't handle SLP for now.
	(vect_transform_stmt): Add a new argument for SLP. Check that there is
	no SLP transformation required for unsupported cases. Add SLP
	argument for supported cases.
	(vect_remove_stores): New function.
	(vect_schedule_slp_instance, vect_schedule_slp): Likewise.
	(vect_transform_loop): Schedule SLP instances.
	* Makefile.in: (tree-vect-analyze.o): Depend on recog.h.

From-SVN: r128289
parent ae2bd7d2
2007-09-09 Ira Rosen <irar@il.ibm.com>
* tree-vectorizer.h (enum vect_def_type): Start enumeration from 1.
(struct _slp_tree, struct _slp_instance): Define new data structures
along macros for their access.
(struct _loop_vec_info): Define new fields: strided_stores,
slp_instances, and slp_unrolling_factor along macros for their access.
(enum slp_vect_type): New.
(struct _stmt_vec_info): Define new field, slp_type, and macros for its
access.
(STMT_VINFO_STRIDED_ACCESS): New macro.
(vect_free_slp_tree): Declare.
(vectorizable_load): Add an argument of type slp_tree.
(vectorizable_store, vectorizable_operation, vectorizable_conversion,
vectorizable_assignment): Likewise.
(vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
Declare (make extern).
* tree-vectorizer.c (new_stmt_vec_info): Initiliaze the new field.
(new_loop_vec_info): Likewise.
(destroy_loop_vec_info): Free memory allocated for SLP structures.
* tree-vect-analyze.c: Include recog.h.
(vect_update_slp_costs_according_to_vf): New.
(vect_analyze_operations): Add argument for calls to vectorizable_ ()
functions. For not pure SLP stmts with strided access check that the
group size is power of 2. Update the vectorization factor according to
SLP. Call vect_update_slp_costs_according_to_vf.
(vect_analyze_group_access): New.
(vect_analyze_data_ref_access): Call vect_analyze_group_access.
(vect_free_slp_tree): New functions.
(vect_get_and_check_slp_defs, vect_build_slp_tree, vect_print_slp_tree,
vect_mark_slp_stmts, vect_analyze_slp_instance, vect_analyze_slp,
vect_make_slp_decision, vect_detect_hybrid_slp_stmts,
vect_detect_hybrid_slp): Likewise.
(vect_analyze_loop): Call vect_analyze_slp, vect_make_slp_decision
and vect_detect_hybrid_slp.
* tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
SLP costs into account.
(vect_get_cost_fields): New function.
(vect_model_simple_cost): Make extern, add SLP parameter and handle
SLP.
(vect_model_store_cost, vect_model_load_cost): Likewise.
(vect_get_constant_vectors): New function.
(vect_get_slp_vect_defs, vect_get_slp_defs,
vect_get_vec_defs_for_stmt_copy, vect_get_vec_defs_for_stmt_copy,
vect_get_vec_defs): Likewise.
(vectorizable_reduction): Don't handle SLP for now.
(vectorizable_call): Don't handle SLP for now. Add argument to
vect_model_simple_cost.
(vectorizable_conversion): Handle SLP (call vect_get_vec_defs to
get SLPed and vectorized defs). Fix indentation and spacing.
(vectorizable_assignment): Handle SLP.
(vectorizable_induction): Don't handle SLP for now.
(vectorizable_operation): Likewise.
(vectorizable_type_demotion): Add argument to
vect_model_simple_cost.
(vectorizable_type_promotion): Likewise.
(vectorizable_store, vectorizable_load): Handle SLP.
(vectorizable_condition): Don't handle SLP for now.
(vect_transform_stmt): Add a new argument for SLP. Check that there is
no SLP transformation required for unsupported cases. Add SLP
argument for supported cases.
(vect_remove_stores): New function.
(vect_schedule_slp_instance, vect_schedule_slp): Likewise.
(vect_transform_loop): Schedule SLP instances.
* Makefile.in: (tree-vect-analyze.o): Depend on recog.h.
2007-09-09 Andrew Haley <aph@redhat.com>
* optabs.c (sign_expand_binop): Set libcall_gen = NULL in the
......@@ -2246,7 +2246,7 @@ tree-data-ref.o: tree-data-ref.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
$(TREE_DATA_REF_H) $(SCEV_H) tree-pass.h tree-chrec.h langhooks.h
tree-vect-analyze.o: tree-vect-analyze.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) \
$(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(RECOG_H) $(BASIC_BLOCK_H) \
$(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
tree-vectorizer.h $(TREE_DATA_REF_H) $(SCEV_H) $(EXPR_H) tree-chrec.h
tree-vect-patterns.o: tree-vect-patterns.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
......
2007-09-09 Ira Rosen <irar@il.ibm.com>
* gcc.dg/vect/vect.exp: Compile tests starting with slp-.
Remove "vect" part from test names for -ffast-math, -ffast-math-errno,
-fwrapv, -ftrapv tests. Add -fno-tree-scev-cprop for slp- tests.
Compile tests with -fno-tree-pre.
* gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Run SLP tests.
* lib/target-supports.exp (check_effective_target_vect_strided): New.
* gcc.dg/vect/slp-1.c, gcc.dg/vect/slp-2.c, gcc.dg/vect/slp-3.c,
gcc.dg/vect/slp-4.c, gcc.dg/vect/slp-5.c, gcc.dg/vect/slp-6.c,
gcc.dg/vect/slp-7.c, gcc.dg/vect/slp-8.c, gcc.dg/vect/slp-9.c,
gcc.dg/vect/slp-10.c, gcc.dg/vect/slp-11.c, gcc.dg/vect/slp-12.c,
gcc.dg/vect/slp-13.c, gcc.dg/vect/slp-14.c, gcc.dg/vect/slp-15.c,
gcc.dg/vect/slp-16.c, gcc.dg/vect/slp-17.c, gcc.dg/vect/slp-18.c,
gcc.dg/vect/slp-19.c, gcc.dg/vect/slp-20.c, gcc.dg/vect/slp-21.c,
gcc.dg/vect/slp-22.c, gcc.dg/vect/slp-23.c, gcc.dg/vect/slp-24.c,
gcc.dg/vect/slp-25.c, gcc.dg/vect/slp-26.c, gcc.dg/vect/slp-28.c,
gcc.dg/vect/fast-math-slp-27.c, gcc.dg/vect/no-tree-pre-slp-29.c,
gcc.dg/vect/no-scevccp-slp-30.c, gcc.dg/vect/no-scevccp-slp-31.c,
gcc.dg/vect/no-math-errno-slp-32.c, gcc.dg/vect/slp-33.c,
gcc.dg/vect/slp-34.c, gcc.dg/vect/slp-35.c, gcc.dg/vect/slp-36.c,
gcc.dg/vect/slp-37.c, gcc.dg/vect/vect-vfa-slp.c,
gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c,
gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: New testcases.
* gcc.dg/vect/vect-vfa-03.c: Change the test to prevent SLP.
2007-09-09 Joseph Myers <joseph@codesourcery.com>
* lib/file-format.exp (gcc_target_object_format): Use remote_exec
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "../../tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
float out2[N*8], fa[N*4];
unsigned int ia[N], ib[N*2];
for (i = 0; i < N; i++)
{
a0 = in[i*8] + 5;
a1 = in[i*8 + 1] + 6;
a2 = in[i*8 + 2] + 7;
a3 = in[i*8 + 3] + 8;
a4 = in[i*8 + 4] + 9;
a5 = in[i*8 + 5] + 10;
a6 = in[i*8 + 6] + 11;
a7 = in[i*8 + 7] + 12;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
b7 = a7 * 2;
out[i*8] = b0 - 2;
out[i*8 + 1] = b1 - 3;
out[i*8 + 2] = b2 - 2;
out[i*8 + 3] = b3 - 1;
out[i*8 + 4] = b4 - 8;
out[i*8 + 5] = b5 - 7;
out[i*8 + 6] = b6 - 3;
out[i*8 + 7] = b7 - 7;
ia[i] = b6;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != (in[i*8] + 5) * 3 - 2
|| out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
|| out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
|| out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
|| out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
|| out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
|| out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
|| out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7
|| ia[i] != (in[i*8 + 6] + 11) * 3)
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = (in[i*4] + 2) * 3;
out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
ib[i] = 7;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != (in[i*4] + 2) * 3
|| out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
|| out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
|| out[i*4 + 3] != (in[i*4 + 3] + 7) * 7
|| ib[i] != 7)
abort ();
}
for (i = 0; i < N*4; i++)
{
out2[i*2] = (float) (in[i*2] * 2 + 11) ;
out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
fa[i] = (float) in[i*2+1];
}
/* check results: */
for (i = 0; i < N*4; i++)
{
if (out2[i*2] != (float) (in[i*2] * 2 + 11)
|| out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7)
|| fa[i] != (float) in[i*2+1])
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target { vect_strided && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { vect_strided && vect_int_mult } } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "../../tree-vect.h"
#define N 32
struct s{
short a; /* aligned */
char b[N-1]; /* unaligned (offset 2B) */
};
int main1 ()
{
int i;
struct s tmp;
/* unaligned */
for (i = 0; i < N/4; i++)
{
tmp.b[2*i] = 5;
tmp.b[2*i+1] = 15;
}
/* check results: */
for (i = 0; i <N/4; i++)
{
if (tmp.b[2*i] != 5
|| tmp.b[2*i+1] != 15)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
return main1 ();
}
/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
......@@ -64,6 +64,8 @@ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-pr*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-vect-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-slp-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
#### Tests with special options
global SAVED_DEFAULT_VECTCFLAGS
......
/* { dg-do compile } */
/* { dg-require-effective-target vect_float } */
float x[256];
void foo(void)
{
int i;
for (i=0; i<256; ++i)
{
x[2*i] = x[2*i] * x[2*i];
x[2*i+1] = x[2*i+1] * x[2*i+1];
}
}
/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_strided } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-do compile } */
/* { dg-require-effective-target vect_double } */
double x[256];
void foo(void)
{
int i;
for (i=0; i<128; ++i)
{
x[2*i] = __builtin_pow (x[2*i], 0.5);
x[2*i+1] = __builtin_pow (x[2*i+1], 0.5);
}
}
/* { dg-final { scan-tree-dump "pattern recognized" "vect" { xfail spu*-*-* } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 128
int
main1 ()
{
int i, j;
unsigned short out[N*8], a[N];
for (j = 0; j < N; j++)
{
for (i = 0; i < N; i++)
{
out[i*4] = 8;
out[i*4 + 1] = 18;
out[i*4 + 2] = 28;
out[i*4 + 3] = 38;
}
a[j] = 8;
}
/* check results: */
for (j = 0; j < N; j++)
{
for (i = 0; i < N; i++)
{
if (out[i*4] != 8
|| out[i*4 + 1] != 18
|| out[i*4 + 2] != 28
|| out[i*4 + 3] != 38)
abort();
}
if (a[j] != 8)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 128
int
main1 ()
{
int i, j;
unsigned short out[N*8], a[N][N];
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
a[i][j] = 8;
}
out[i*4] = 8;
out[i*4 + 1] = 18;
out[i*4 + 2] = 28;
out[i*4 + 3] = 38;
}
/* check results: */
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
if (a[i][j] != 8)
abort ();
}
if (out[i*4] != 8
|| out[i*4 + 1] != 18
|| out[i*4 + 2] != 28
|| out[i*4 + 3] != 38)
abort();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
unsigned short in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
int
main1 (unsigned short *in)
{
int i;
unsigned short out[N*8];
for (i = 0; i < N; i++)
{
out[i*4] = in[i*4];
out[i*4 + 1] = in[i*4 + 1];
out[i*4 + 2] = in[i*4 + 2];
out[i*4 + 3] = in[i*4 + 3];
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != in[i*4]
|| out[i*4 + 1] != in[i*4 + 1]
|| out[i*4 + 2] != in[i*4 + 2]
|| out[i*4 + 3] != in[i*4 + 3])
abort ();
}
return 0;
}
int
main2 (unsigned short * __restrict__ in, unsigned short * __restrict__ out)
{
int i;
for (i = 0; i < N; i++)
{
out[i*4] = in[i*4];
out[i*4 + 1] = in[i*4 + 1];
out[i*4 + 2] = in[i*4 + 2];
out[i*4 + 3] = in[i*4 + 3];
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != in[i*4]
|| out[i*4 + 1] != in[i*4 + 1]
|| out[i*4 + 2] != in[i*4 + 2]
|| out[i*4 + 3] != in[i*4 + 3])
abort ();
}
return 0;
}
int main (void)
{
unsigned short out[N*8];
check_vect ();
main1 (&in2[5]);
main2 (&in2[3], &out[3]);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail vect_no_align } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 128
int
main1 ()
{
int i;
unsigned short out[N*8];
for (i = 0; i < N; i++)
{
out[i*4] = 8;
out[i*4 + 1] = 18;
out[i*4 + 2] = 28;
out[i*4 + 3] = 38;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != 8
|| out[i*4 + 1] != 18
|| out[i*4 + 2] != 28
|| out[i*4 + 3] != 38)
abort ();
}
for (i = 0; i < N; i++)
{
out[i*8] = 8;
out[i*8 + 1] = 7;
out[i*8 + 2] = 81;
out[i*8 + 3] = 28;
out[i*8 + 4] = 18;
out[i*8 + 5] = 85;
out[i*8 + 6] = 5;
out[i*8 + 7] = 4;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != 8
|| out[i*8 + 1] != 7
|| out[i*8 + 2] != 81
|| out[i*8 + 3] != 28
|| out[i*8 + 4] != 18
|| out[i*8 + 5] != 85
|| out[i*8 + 6] != 5
|| out[i*8 + 7] != 4)
abort ();
}
/* SLP with unrolling by 8. */
for (i = 0; i < N; i++)
{
out[i*5] = 8;
out[i*5 + 1] = 7;
out[i*5 + 2] = 81;
out[i*5 + 3] = 28;
out[i*5 + 4] = 18;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*5] != 8
|| out[i*5 + 1] != 7
|| out[i*5 + 2] != 81
|| out[i*5 + 3] != 28
|| out[i*5 + 4] != 18)
abort ();
}
/* SLP with unrolling by 8. */
for (i = 0; i < N/2; i++)
{
out[i*9] = 8;
out[i*9 + 1] = 7;
out[i*9 + 2] = 81;
out[i*9 + 3] = 28;
out[i*9 + 4] = 18;
out[i*9 + 5] = 85;
out[i*9 + 6] = 5;
out[i*9 + 7] = 4;
out[i*9 + 8] = 14;
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out[i*9] != 8
|| out[i*9 + 1] != 7
|| out[i*9 + 2] != 81
|| out[i*9 + 3] != 28
|| out[i*9 + 4] != 18
|| out[i*9 + 5] != 85
|| out[i*9 + 6] != 5
|| out[i*9 + 7] != 4
|| out[i*9 + 8] != 14)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
float out2[N*8];
for (i = 0; i < N; i++)
{
a0 = in[i*8] + 5;
a1 = in[i*8 + 1] + 6;
a2 = in[i*8 + 2] + 7;
a3 = in[i*8 + 3] + 8;
a4 = in[i*8 + 4] + 9;
a5 = in[i*8 + 5] + 10;
a6 = in[i*8 + 6] + 11;
a7 = in[i*8 + 7] + 12;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
b7 = a7 * 2;
out[i*8] = b0 - 2;
out[i*8 + 1] = b1 - 3;
out[i*8 + 2] = b2 - 2;
out[i*8 + 3] = b3 - 1;
out[i*8 + 4] = b4 - 8;
out[i*8 + 5] = b5 - 7;
out[i*8 + 6] = b6 - 3;
out[i*8 + 7] = b7 - 7;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != (in[i*8] + 5) * 3 - 2
|| out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
|| out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
|| out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
|| out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
|| out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
|| out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
|| out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = (in[i*4] + 2) * 3;
out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != (in[i*4] + 2) * 3
|| out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
|| out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
|| out[i*4 + 3] != (in[i*4 + 3] + 7) * 7)
abort ();
}
for (i = 0; i < N*4; i++)
{
out2[i*2] = (float) (in[i*2] * 2 + 5) ;
out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
}
/* check results: */
for (i = 0; i < N*4; i++)
{
if (out2[i*2] != (float) (in[i*2] * 2 + 5)
|| out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target {vect_intfloat_cvt && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target {vect_intfloat_cvt && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
float out2[N*8];
/* Different operations - not SLPable. */
for (i = 0; i < N; i++)
{
a0 = in[i*8] + 5;
a1 = in[i*8 + 1] * 6;
a2 = in[i*8 + 2] + 7;
a3 = in[i*8 + 3] + 8;
a4 = in[i*8 + 4] + 9;
a5 = in[i*8 + 5] + 10;
a6 = in[i*8 + 6] + 11;
a7 = in[i*8 + 7] + 12;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
b7 = a7 * 2;
out[i*8] = b0 - 2;
out[i*8 + 1] = b1 - 3;
out[i*8 + 2] = b2 - 2;
out[i*8 + 3] = b3 - 1;
out[i*8 + 4] = b4 - 8;
out[i*8 + 5] = b5 - 7;
out[i*8 + 6] = b6 - 3;
out[i*8 + 7] = b7 - 7;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != (in[i*8] + 5) * 3 - 2
|| out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
|| out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
|| out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
|| out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
|| out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
|| out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
|| out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
abort ();
}
/* Requires permutation - not SLPable. */
for (i = 0; i < N*2; i++)
{
out[i*4] = (in[i*4] + 2) * 3;
out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != (in[i*4] + 2) * 3
|| out[i*4 + 1] != (in[i*4 + 2] + 2) * 7
|| out[i*4 + 2] != (in[i*4 + 1] + 7) * 3
|| out[i*4 + 3] != (in[i*4 + 3] + 3) * 4)
abort ();
}
/* Different operations - not SLPable. */
for (i = 0; i < N*4; i++)
{
out2[i*2] = ((float) in[i*2] * 2 + 6) ;
out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
}
/* check results: */
for (i = 0; i < N*4; i++)
{
if (out2[i*2] != ((float) in[i*2] * 2 + 6)
|| out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { ! { vect_int_mult && vect_strided } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int ia[N], ib[N*2];
for (i = 0; i < N; i++)
{
a0 = in[i*8] + 5;
a1 = in[i*8 + 1] + 6;
a2 = in[i*8 + 2] + 7;
a3 = in[i*8 + 3] + 8;
a4 = in[i*8 + 4] + 9;
a5 = in[i*8 + 5] + 10;
a6 = in[i*8 + 6] + 11;
a7 = in[i*8 + 7] + 12;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
b7 = a7 * 2;
out[i*8] = b0 - 2;
out[i*8 + 1] = b1 - 3;
out[i*8 + 2] = b2 - 2;
out[i*8 + 3] = b3 - 1;
out[i*8 + 4] = b4 - 8;
out[i*8 + 5] = b5 - 7;
out[i*8 + 6] = b6 - 3;
out[i*8 + 7] = b7 - 7;
ia[i] = b6;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != (in[i*8] + 5) * 3 - 2
|| out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
|| out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
|| out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
|| out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
|| out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
|| out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
|| out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7
|| ia[i] != (in[i*8 + 6] + 11) * 3)
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = (in[i*4] + 2) * 3;
out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
ib[i] = 7;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != (in[i*4] + 2) * 3
|| out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
|| out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
|| out[i*4 + 3] != (in[i*4 + 3] + 7) * 7
|| ib[i] != 7)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target { vect_strided && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { ! vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target { ! vect_int_mult } } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_intfloat_cvt } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 64
int
main1 ()
{
int i;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
float out2[N*8], fa[N*4];
for (i = 0; i < N; i++)
{
out2[i*2] = (float) (in[i*2] * 2 + 11) ;
out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
fa[i] = (float) in[i*2+1];
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out2[i*2] != (float) (in[i*2] * 2 + 11)
|| out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7)
|| fa[i] != (float) in[i*2+1])
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target { vect_strided && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { vect_strided && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned short out[N*8];
unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int out2[N*8];
/* Induction is not SLPable yet. */
for (i = 0; i < N; i++)
{
out[i*8] = in[i*8] + i;
out[i*8 + 1] = in[i*8 + 1] + i;
out[i*8 + 2] = in[i*8 + 2] + i;
out[i*8 + 3] = in[i*8 + 3] + i;
out[i*8 + 4] = in[i*8 + 4] + i;
out[i*8 + 5] = in[i*8 + 5] + i;
out[i*8 + 6] = in[i*8 + 6] + i;
out[i*8 + 7] = in[i*8 + 7] + i;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != in[i*8] + i
|| out[i*8 + 1] != in[i*8 + 1] + i
|| out[i*8 + 2] != in[i*8 + 2] + i
|| out[i*8 + 3] != in[i*8 + 3] + i
|| out[i*8 + 4] != in[i*8 + 4] + i
|| out[i*8 + 5] != in[i*8 + 5] + i
|| out[i*8 + 6] != in[i*8 + 6] + i
|| out[i*8 + 7] != in[i*8 + 7] + i)
abort ();
}
/* Induction is not SLPable yet and strided group size must be a power of 2
to get vectorized. */
for (i = 0; i < N/2; i++)
{
out2[i*12] = in2[i*12] + i;
out2[i*12 + 1] = in2[i*12 + 1] + i;
out2[i*12 + 2] = in2[i*12 + 2] + i;
out2[i*12 + 3] = in2[i*12 + 3] + i;
out2[i*12 + 4] = in2[i*12 + 4] + i;
out2[i*12 + 5] = in2[i*12 + 5] + i;
out2[i*12 + 6] = in2[i*12 + 6] + i;
out2[i*12 + 7] = in2[i*12 + 7] + i;
out2[i*12 + 8] = in2[i*12 + 8] + i;
out2[i*12 + 9] = in2[i*12 + 9] + i;
out2[i*12 + 10] = in2[i*12 + 10] + i;
out2[i*12 + 11] = in2[i*12 + 11] + i;
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out2[i*12] != in2[i*12] + i
|| out2[i*12 + 1] != in2[i*12 + 1] + i
|| out2[i*12 + 2] != in2[i*12 + 2] + i
|| out2[i*12 + 3] != in2[i*12 + 3] + i
|| out2[i*12 + 4] != in2[i*12 + 4] + i
|| out2[i*12 + 5] != in2[i*12 + 5] + i
|| out2[i*12 + 6] != in2[i*12 + 6] + i
|| out2[i*12 + 7] != in2[i*12 + 7] + i
|| out2[i*12 + 8] != in2[i*12 + 8] + i
|| out2[i*12 + 9] != in2[i*12 + 9] + i
|| out2[i*12 + 10] != in2[i*12 + 10] + i
|| out2[i*12 + 11] != in2[i*12 + 11] + i)
abort ();
}
/* Not power of 2 but SLPable. */
for (i = 0; i < N/2; i++)
{
out2[i*12] = in2[i*12] + 1;
out2[i*12 + 1] = in2[i*12 + 1] + 2;
out2[i*12 + 2] = in2[i*12 + 2] + 3;
out2[i*12 + 3] = in2[i*12 + 3] + 4;
out2[i*12 + 4] = in2[i*12 + 4] + 5;
out2[i*12 + 5] = in2[i*12 + 5] + 6;
out2[i*12 + 6] = in2[i*12 + 6] + 7;
out2[i*12 + 7] = in2[i*12 + 7] + 8;
out2[i*12 + 8] = in2[i*12 + 8] + 9;
out2[i*12 + 9] = in2[i*12 + 9] + 10;
out2[i*12 + 10] = in2[i*12 + 10] + 11;
out2[i*12 + 11] = in2[i*12 + 11] + 12;
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out2[i*12] != in2[i*12] + 1
|| out2[i*12 + 1] != in2[i*12 + 1] + 2
|| out2[i*12 + 2] != in2[i*12 + 2] + 3
|| out2[i*12 + 3] != in2[i*12 + 3] + 4
|| out2[i*12 + 4] != in2[i*12 + 4] + 5
|| out2[i*12 + 5] != in2[i*12 + 5] + 6
|| out2[i*12 + 6] != in2[i*12 + 6] + 7
|| out2[i*12 + 7] != in2[i*12 + 7] + 8
|| out2[i*12 + 8] != in2[i*12 + 8] + 9
|| out2[i*12 + 9] != in2[i*12 + 9] + 10
|| out2[i*12 + 10] != in2[i*12 + 10] + 11
|| out2[i*12 + 11] != in2[i*12 + 11] + 12)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 (int n)
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short out2[N*16];
/* Multiple types are not SLPable yet. */
for (i = 0; i < n; i++)
{
a0 = in[i*8] + 5;
a1 = in[i*8 + 1] + 6;
a2 = in[i*8 + 2] + 7;
a3 = in[i*8 + 3] + 8;
a4 = in[i*8 + 4] + 9;
a5 = in[i*8 + 5] + 10;
a6 = in[i*8 + 6] + 11;
a7 = in[i*8 + 7] + 12;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
b7 = a7 * 2;
out[i*8] = b0 - 2;
out[i*8 + 1] = b1 - 3;
out[i*8 + 2] = b2 - 2;
out[i*8 + 3] = b3 - 1;
out[i*8 + 4] = b4 - 8;
out[i*8 + 5] = b5 - 7;
out[i*8 + 6] = b6 - 3;
out[i*8 + 7] = b7 - 7;
out2[i*16] = in2[i*16] + 2;
out2[i*16 + 1] = in2[i*16 + 1] + 3;
out2[i*16 + 2] = in2[i*16 + 2] + 4;
out2[i*16 + 3] = in2[i*16 + 3] + 3;
out2[i*16 + 4] = in2[i*16 + 4] + 2;
out2[i*16 + 5] = in2[i*16 + 5] + 3;
out2[i*16 + 6] = in2[i*16 + 6] + 2;
out2[i*16 + 7] = in2[i*16 + 7] + 4;
out2[i*16 + 8] = in2[i*16 + 8] + 2;
out2[i*16 + 9] = in2[i*16 + 9] + 5;
out2[i*16 + 10] = in2[i*16 + 10] + 2;
out2[i*16 + 11] = in2[i*16 + 11] + 3;
out2[i*16 + 12] = in2[i*16 + 12] + 4;
out2[i*16 + 13] = in2[i*16 + 13] + 4;
out2[i*16 + 14] = in2[i*16 + 14] + 3;
out2[i*16 + 15] = in2[i*16 + 15] + 2;
}
/* check results: */
for (i = 0; i < n; i++)
{
if (out[i*8] != (in[i*8] + 5) * 3 - 2
|| out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
|| out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
|| out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
|| out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
|| out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
|| out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
|| out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
abort ();
if (out2[i*16] != in2[i*16] + 2
|| out2[i*16 + 1] != in2[i*16 + 1] + 3
|| out2[i*16 + 2] != in2[i*16 + 2] + 4
|| out2[i*16 + 3] != in2[i*16 + 3] + 3
|| out2[i*16 + 4] != in2[i*16 + 4] + 2
|| out2[i*16 + 5] != in2[i*16 + 5] + 3
|| out2[i*16 + 6] != in2[i*16 + 6] + 2
|| out2[i*16 + 7] != in2[i*16 + 7] + 4
|| out2[i*16 + 8] != in2[i*16 + 8] + 2
|| out2[i*16 + 9] != in2[i*16 + 9] + 5
|| out2[i*16 + 10] != in2[i*16 + 10] + 2
|| out2[i*16 + 11] != in2[i*16 + 11] + 3
|| out2[i*16 + 12] != in2[i*16 + 12] + 4
|| out2[i*16 + 13] != in2[i*16 + 13] + 4
|| out2[i*16 + 14] != in2[i*16 + 14] + 3
|| out2[i*16 + 15] != in2[i*16 + 15] + 2)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 (N);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { ! { vect_strided && vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 (int n)
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int out2[N*16];
for (i = 0; i < n; i++)
{
a0 = in[i*8] + 5;
a1 = in[i*8 + 1] + 6;
a2 = in[i*8 + 2] + 7;
a3 = in[i*8 + 3] + 8;
a4 = in[i*8 + 4] + 9;
a5 = in[i*8 + 5] + 10;
a6 = in[i*8 + 6] + 11;
a7 = in[i*8 + 7] + 12;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
b7 = a7 * 2;
out[i*8] = b0 - 2;
out[i*8 + 1] = b1 - 3;
out[i*8 + 2] = b2 - 2;
out[i*8 + 3] = b3 - 1;
out[i*8 + 4] = b4 - 8;
out[i*8 + 5] = b5 - 7;
out[i*8 + 6] = b6 - 3;
out[i*8 + 7] = b7 - 7;
out2[i*16] = in2[i*16] * 2;
out2[i*16 + 1] = in2[i*16 + 1] * 3;
out2[i*16 + 2] = in2[i*16 + 2] * 4;
out2[i*16 + 3] = in2[i*16 + 3] * 3;
out2[i*16 + 4] = in2[i*16 + 4] * 2;
out2[i*16 + 5] = in2[i*16 + 5] * 3;
out2[i*16 + 6] = in2[i*16 + 6] * 2;
out2[i*16 + 7] = in2[i*16 + 7] * 4;
out2[i*16 + 8] = in2[i*16 + 8] * 2;
out2[i*16 + 9] = in2[i*16 + 9] * 5;
out2[i*16 + 10] = in2[i*16 + 10] * 2;
out2[i*16 + 11] = in2[i*16 + 11] * 3;
out2[i*16 + 12] = in2[i*16 + 12] * 4;
out2[i*16 + 13] = in2[i*16 + 13] * 4;
out2[i*16 + 14] = in2[i*16 + 14] * 3;
out2[i*16 + 15] = in2[i*16 + 15] * 2;
}
/* check results: */
for (i = 0; i < n; i++)
{
if (out[i*8] != (in[i*8] + 5) * 3 - 2
|| out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
|| out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
|| out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
|| out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
|| out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
|| out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
|| out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
abort ();
if (out2[i*16] != in2[i*16] * 2
|| out2[i*16 + 1] != in2[i*16 + 1] * 3
|| out2[i*16 + 2] != in2[i*16 + 2] * 4
|| out2[i*16 + 3] != in2[i*16 + 3] * 3
|| out2[i*16 + 4] != in2[i*16 + 4] * 2
|| out2[i*16 + 5] != in2[i*16 + 5] * 3
|| out2[i*16 + 6] != in2[i*16 + 6] * 2
|| out2[i*16 + 7] != in2[i*16 + 7] * 4
|| out2[i*16 + 8] != in2[i*16 + 8] * 2
|| out2[i*16 + 9] != in2[i*16 + 9] * 5
|| out2[i*16 + 10] != in2[i*16 + 10] * 2
|| out2[i*16 + 11] != in2[i*16 + 11] * 3
|| out2[i*16 + 12] != in2[i*16 + 12] * 4
|| out2[i*16 + 13] != in2[i*16 + 13] * 4
|| out2[i*16 + 14] != in2[i*16 + 14] * 3
|| out2[i*16 + 15] != in2[i*16 + 15] * 2)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 (N);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { ! { vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { ! { vect_int_mult } } } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int out2[N*16];
/* SLP group of size that is not a multiple of vector size.
Unrolling by 2. */
for (i = 0; i < N; i++)
{
a0 = in[i*2] + 5;
a1 = in[i*2 + 1] + 6;
b0 = a0 * 3;
b1 = a1 * 2;
out[i*2] = b0 - 2;
out[i*2 + 1] = b1 - 3;
out2[i*6] = in2[i*6] * 2;
out2[i*6 + 1] = in2[i*6 + 1] * 3;
out2[i*6 + 2] = in2[i*6 + 2] * 4;
out2[i*6 + 3] = in2[i*6 + 3] * 2;
out2[i*6 + 4] = in2[i*6 + 4] * 4;
out2[i*6 + 5] = in2[i*6 + 5] * 3;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*2] != (in[i*2] + 5) * 3 - 2
|| out[i*2 + 1] != (in[i*2 + 1] + 6) * 2 - 3
|| out2[i*6] != in2[i*6] * 2
|| out2[i*6 + 1] != in2[i*6 + 1] * 3
|| out2[i*6 + 2] != in2[i*6 + 2] * 4
|| out2[i*6 + 3] != in2[i*6 + 3] * 2
|| out2[i*6 + 4] != in2[i*6 + 4] * 4
|| out2[i*6 + 5] != in2[i*6 + 5] * 3)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned short out[N*8];
unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short out2[N*8];
for (i = 0; i < N*2; i++)
{
out[i*2] = in[i*2] + 5;
out[i*2 + 1] = in[i*2 + 1] + 6;
out2[i*4] = in2[i*4] + 2;
out2[i*4 + 1] = in2[i*4 + 1] + 2;
out2[i*4 + 2] = in2[i*4 + 2] + 1;
out2[i*4 + 3] = in2[i*4 + 3] + 3;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*2] != in[i*2] + 5
|| out[i*2 + 1] != in[i*2 + 1] + 6
|| out2[i*4] != in2[i*4] + 2
|| out2[i*4 + 1] != in2[i*4 + 1] + 2
|| out2[i*4 + 2] != in2[i*4 + 2] + 1
|| out2[i*4 + 3] != in2[i*4 + 3] + 3)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
float out2[N*8];
for (i = 0; i < N; i++)
{
a0 = in[i*8] + 5;
a1 = in[i*8 + 1] + 6;
a2 = in[i*8 + 2] + 7;
a3 = in[i*8 + 3] + 8;
a4 = in[i*8 + 4] + 9;
a5 = in[i*8 + 5] + 10;
a6 = in[i*8 + 6] + 11;
a7 = in[i*8 + 7] + 12;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
b7 = a7 * 2;
out[i*8] = b0 - 2;
out[i*8 + 1] = b1 - 3;
out[i*8 + 2] = b2 - 2;
out[i*8 + 3] = b3 - 1;
out[i*8 + 4] = b4 - 8;
out[i*8 + 5] = b5 - 7;
out[i*8 + 6] = b6 - 3;
out[i*8 + 7] = b7 - 7;
out2[i*8] = (float) b0;
out2[i*8 + 1] = (float) b1;
out2[i*8 + 2] = (float) b2;
out2[i*8 + 3] = (float) b3;
out2[i*8 + 4] = (float) b4;
out2[i*8 + 5] = (float) b5;
out2[i*8 + 6] = (float) b6;
out2[i*8 + 7] = (float) b7;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != (in[i*8] + 5) * 3 - 2
|| out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
|| out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
|| out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
|| out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
|| out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
|| out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
|| out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
abort ();
if (out2[i*8] != (float) ((in[i*8] + 5) * 3)
|| out2[i*8 + 1] != (float) ((in[i*8 + 1] + 6) * 2)
|| out2[i*8 + 2] != (float) ((in[i*8 + 2] + 7) * 12)
|| out2[i*8 + 3] != (float) ((in[i*8 + 3] + 8) * 5)
|| out2[i*8 + 4] != (float) ((in[i*8 + 4] + 9) * 8)
|| out2[i*8 + 5] != (float) ((in[i*8 + 5] + 10) * 4)
|| out2[i*8 + 6] != (float) ((in[i*8 + 6] + 11) * 3)
|| out2[i*8 + 7] != (float) ((in[i*8 + 7] + 12) * 2))
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided } } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 16
int
main1 ()
{
unsigned int i;
unsigned int out[N*8];
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int ia[N*2], a0, a1, a2, a3;
for (i = 0; i < N; i++)
{
out[i*8] = in[i*8];
out[i*8 + 1] = in[i*8 + 1];
out[i*8 + 2] = in[i*8 + 2];
out[i*8 + 3] = in[i*8 + 3];
out[i*8 + 4] = in[i*8 + 4];
out[i*8 + 5] = in[i*8 + 5];
out[i*8 + 6] = in[i*8 + 6];
out[i*8 + 7] = in[i*8 + 7];
ia[i] = in[i*8 + 2];
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != in[i*8]
|| out[i*8 + 1] != in[i*8 + 1]
|| out[i*8 + 2] != in[i*8 + 2]
|| out[i*8 + 3] != in[i*8 + 3]
|| out[i*8 + 4] != in[i*8 + 4]
|| out[i*8 + 5] != in[i*8 + 5]
|| out[i*8 + 6] != in[i*8 + 6]
|| out[i*8 + 7] != in[i*8 + 7]
|| ia[i] != in[i*8 + 2])
abort ();
}
for (i = 0; i < N*2; i++)
{
a0 = in[i*4] + 1;
a1 = in[i*4 + 1] + 2;
a2 = in[i*4 + 2] + 3;
a3 = in[i*4 + 3] + 4;
out[i*4] = a0;
out[i*4 + 1] = a1;
out[i*4 + 2] = a2;
out[i*4 + 3] = a3;
ia[i] = a2;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != in[i*4] + 1
|| out[i*4 + 1] != in[i*4 + 1] + 2
|| out[i*4 + 2] != in[i*4 + 2] + 3
|| out[i*4 + 3] != in[i*4 + 3] + 4
|| ia[i] != in[i*4 + 2] + 3)
abort ();
}
/* The last stmt requires interleaving of not power of 2 size - not
vectorizable. */
for (i = 0; i < N/2; i++)
{
out[i*12] = in[i*12];
out[i*12 + 1] = in[i*12 + 1];
out[i*12 + 2] = in[i*12 + 2];
out[i*12 + 3] = in[i*12 + 3];
out[i*12 + 4] = in[i*12 + 4];
out[i*12 + 5] = in[i*12 + 5];
out[i*12 + 6] = in[i*12 + 6];
out[i*12 + 7] = in[i*12 + 7];
out[i*12 + 8] = in[i*12 + 8];
out[i*12 + 9] = in[i*12 + 9];
out[i*12 + 10] = in[i*12 + 10];
out[i*12 + 11] = in[i*12 + 11];
ia[i] = in[i*12 + 7];
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out[i*12] != in[i*12]
|| out[i*12 + 1] != in[i*12 + 1]
|| out[i*12 + 2] != in[i*12 + 2]
|| out[i*12 + 3] != in[i*12 + 3]
|| out[i*12 + 4] != in[i*12 + 4]
|| out[i*12 + 5] != in[i*12 + 5]
|| out[i*12 + 6] != in[i*12 + 6]
|| out[i*12 + 7] != in[i*12 + 7]
|| out[i*12 + 8] != in[i*12 + 8]
|| out[i*12 + 9] != in[i*12 + 9]
|| out[i*12 + 10] != in[i*12 + 10]
|| out[i*12 + 11] != in[i*12 + 11]
|| ia[i] != in[i*12 + 7])
abort ();
}
/* Hybrid SLP with unrolling by 2. */
for (i = 0; i < N; i++)
{
out[i*6] = in[i*6];
out[i*6 + 1] = in[i*6 + 1];
out[i*6 + 2] = in[i*6 + 2];
out[i*6 + 3] = in[i*6 + 3];
out[i*6 + 4] = in[i*6 + 4];
out[i*6 + 5] = in[i*6 + 5];
ia[i] = i;
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out[i*6] != in[i*6]
|| out[i*6 + 1] != in[i*6 + 1]
|| out[i*6 + 2] != in[i*6 + 2]
|| out[i*6 + 3] != in[i*6 + 3]
|| out[i*6 + 4] != in[i*6 + 4]
|| out[i*6 + 5] != in[i*6 + 5]
|| ia[i] != i)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target vect_strided } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_strided } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! { vect_strided } } } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 128
int
main1 (unsigned short a0, unsigned short a1, unsigned short a2,
unsigned short a3, unsigned short a4, unsigned short a5,
unsigned short a6, unsigned short a7, unsigned short a8,
unsigned short a9, unsigned short a10, unsigned short a11,
unsigned short a12, unsigned short a13, unsigned short a14,
unsigned short a15)
{
int i;
unsigned short out[N*16];
for (i = 0; i < N; i++)
{
out[i*4] = a8;
out[i*4 + 1] = a1;
out[i*4 + 2] = a2;
out[i*4 + 3] = a3;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != a8
|| out[i*4 + 1] != a1
|| out[i*4 + 2] != a2
|| out[i*4 + 3] != a3)
abort ();
}
for (i = 0; i < N; i++)
{
out[i*16] = a8;
out[i*16 + 1] = a7;
out[i*16 + 2] = a1;
out[i*16 + 3] = a2;
out[i*16 + 4] = a8;
out[i*16 + 5] = a5;
out[i*16 + 6] = a5;
out[i*16 + 7] = a4;
out[i*16 + 8] = a12;
out[i*16 + 9] = a13;
out[i*16 + 10] = a14;
out[i*16 + 11] = a15;
out[i*16 + 12] = a6;
out[i*16 + 13] = a9;
out[i*16 + 14] = a0;
out[i*16 + 15] = a7;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*16] != a8
|| out[i*16 + 1] != a7
|| out[i*16 + 2] != a1
|| out[i*16 + 3] != a2
|| out[i*16 + 4] != a8
|| out[i*16 + 5] != a5
|| out[i*16 + 6] != a5
|| out[i*16 + 7] != a4
|| out[i*16 + 8] != a12
|| out[i*16 + 9] != a13
|| out[i*16 + 10] != a14
|| out[i*16 + 11] != a15
|| out[i*16 + 12] != a6
|| out[i*16 + 13] != a9
|| out[i*16 + 14] != a0
|| out[i*16 + 15] != a7)
abort ();
}
/* SLP with unrolling by 8. */
for (i = 0; i < N; i++)
{
out[i*3] = a8;
out[i*3 + 1] = a1;
out[i*3 + 2] = a2;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*3] != a8
|| out[i*3 + 1] != a1
|| out[i*3 + 2] != a2)
abort ();
}
/* SLP with unrolling by 8. */
for (i = 0; i < N; i++)
{
out[i*11] = a8;
out[i*11 + 1] = a7;
out[i*11 + 2] = a1;
out[i*11 + 3] = a2;
out[i*11 + 4] = a8;
out[i*11 + 5] = a5;
out[i*11 + 6] = a5;
out[i*11 + 7] = a4;
out[i*11 + 8] = a12;
out[i*11 + 9] = a13;
out[i*11 + 10] = a14;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*11] != a8
|| out[i*11 + 1] != a7
|| out[i*11 + 2] != a1
|| out[i*11 + 3] != a2
|| out[i*11 + 4] != a8
|| out[i*11 + 5] != a5
|| out[i*11 + 6] != a5
|| out[i*11 + 7] != a4
|| out[i*11 + 8] != a12
|| out[i*11 + 9] != a13
|| out[i*11 + 10] != a14)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 (15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 128
int
main1 (unsigned short a0, unsigned short a1, unsigned short a2,
unsigned short a3, unsigned short a4, unsigned short a5,
unsigned short a6, unsigned short a7, unsigned short a8)
{
int i;
unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, b5, b6, b7, b8;
for (i = 0; i < N; i++)
{
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 + 5;
b4 = a4 + 4;
b5 = a5 + 3;
out[i*4] = b0;
out[i*4 + 1] = b1;
out[i*4 + 2] = b2;
out[i*4 + 3] = b3;
out2[i*4] = b0;
out2[i*4 + 1] = b1;
out2[i*4 + 2] = b4;
out2[i*4 + 3] = b5;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != b0
|| out[i*4 + 1] != b1
|| out[i*4 + 2] != b2
|| out[i*4 + 3] != b3)
abort ();
if (out2[i*4] != b0
|| out2[i*4 + 1] != b1
|| out2[i*4 + 2] != b4
|| out2[i*4 + 3] != b5)
abort ();
}
for (i = 0; i < N; i++)
{
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 + 5;
b4 = a4 + 4;
b5 = a5 + 3;
b6 = a6 + 2;
b7 = a7 + 1;
b8 = a8 + 9;
out[i*4] = b0;
out[i*4 + 1] = b1;
out[i*4 + 2] = b2;
out[i*4 + 3] = b3;
out2[i*8] = b0;
out2[i*8 + 1] = b1;
out2[i*8 + 2] = b4;
out2[i*8 + 3] = b5;
out2[i*8 + 4] = b6;
out2[i*8 + 5] = b2;
out2[i*8 + 6] = b7;
out2[i*8 + 7] = b8;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != b0
|| out[i*4 + 1] != b1
|| out[i*4 + 2] != b2
|| out[i*4 + 3] != b3)
abort ();
if (out2[i*8] != b0
|| out2[i*8 + 1] != b1
|| out2[i*8 + 2] != b4
|| out2[i*8 + 3] != b5
|| out2[i*8 + 4] != b6
|| out2[i*8 + 5] != b2
|| out2[i*8 + 6] != b7
|| out2[i*8 + 7] != b8)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 (8,7,6,5,4,3,2,1,0);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 128
int
main1 ()
{
unsigned short i;
unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, a0, a1, a2, a3, b5;
unsigned short in[N*8];
for (i = 0; i < N*8; i++)
{
in[i] = i;
}
/* Different operations in both cases - vectorization with interleaving. */
for (i = 0; i < N; i++)
{
a0 = in[i*4];
a1 = in[i*4 + 1];
a2 = in[i*4 + 2];
a3 = in[i*4 + 3];
b0 = a0 * 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 * 5;
b4 = a2 + 4;
b5 = a3 + 3;
out[i*4] = b0;
out[i*4 + 1] = b1;
out[i*4 + 2] = b2;
out[i*4 + 3] = b3;
out2[i*4] = b0;
out2[i*4 + 1] = b1;
out2[i*4 + 2] = b4;
out2[i*4 + 3] = b5;
}
/* check results: */
for (i = 0; i < N; i++)
{
a0 = in[i*4];
a1 = in[i*4 + 1];
a2 = in[i*4 + 2];
a3 = in[i*4 + 3];
b0 = a0 * 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 * 5;
b4 = a2 + 4;
b5 = a3 + 3;
if (out[i*4] != b0
|| out[i*4 + 1] != b1
|| out[i*4 + 2] != b2
|| out[i*4 + 3] != b3)
abort ();
if (out2[i*4] != b0
|| out2[i*4 + 1] != b1
|| out2[i*4 + 2] != b4
|| out2[i*4 + 3] != b5)
abort ();
}
/* Different operations in the first case - vectorization with interleaving. */
for (i = 0; i < N; i++)
{
a0 = in[i*4];
a1 = in[i*4 + 1];
a2 = in[i*4 + 2];
a3 = in[i*4 + 3];
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 * 5;
b4 = a2 + 4;
b5 = a3 + 3;
out[i*4] = b0;
out[i*4 + 1] = b1;
out[i*4 + 2] = b2;
out[i*4 + 3] = b3;
out2[i*4] = b0;
out2[i*4 + 1] = b1;
out2[i*4 + 2] = b4;
out2[i*4 + 3] = b5;
}
/* check results: */
for (i = 0; i < N; i++)
{
a0 = in[i*4];
a1 = in[i*4 + 1];
a2 = in[i*4 + 2];
a3 = in[i*4 + 3];
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 * 5;
b4 = a2 + 4;
b5 = a3 + 3;
if (out[i*4] != b0
|| out[i*4 + 1] != b1
|| out[i*4 + 2] != b2
|| out[i*4 + 3] != b3)
abort ();
if (out2[i*4] != b0
|| out2[i*4 + 1] != b1
|| out2[i*4 + 2] != b4
|| out2[i*4 + 3] != b5)
abort ();
}
/* Different operations in the second case - vectorization with interleaving. */
for (i = 0; i < N; i++)
{
a0 = in[i*4];
a1 = in[i*4 + 1];
a2 = in[i*4 + 2];
a3 = in[i*4 + 3];
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 + 5;
b4 = a2 * 4;
b5 = a3 + 3;
out[i*4] = b0;
out[i*4 + 1] = b1;
out[i*4 + 2] = b2;
out[i*4 + 3] = b3;
out2[i*4] = b0;
out2[i*4 + 1] = b1;
out2[i*4 + 2] = b4;
out2[i*4 + 3] = b5;
}
/* check results: */
for (i = 0; i < N; i++)
{
a0 = in[i*4];
a1 = in[i*4 + 1];
a2 = in[i*4 + 2];
a3 = in[i*4 + 3];
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 + 5;
b4 = a2 * 4;
b5 = a3 + 3;
if (out[i*4] != b0
|| out[i*4 + 1] != b1
|| out[i*4 + 2] != b2
|| out[i*4 + 3] != b3)
abort ();
if (out2[i*4] != b0
|| out2[i*4 + 1] != b1
|| out2[i*4 + 2] != b4
|| out2[i*4 + 3] != b5)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target vect_strided } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided } } } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 128
int
main1 (unsigned short a0, unsigned short a1, unsigned short a2,
unsigned short a3, unsigned short a4, unsigned short a5,
unsigned short a6, unsigned short a7, unsigned short a8)
{
int i;
unsigned short out[N*8], out2[N*8], out3[N*8], b0, b1, b2, b3, b4, b5, b6, b7, b8;
for (i = 0; i < N; i++)
{
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 + 5;
b4 = a4 + 4;
b5 = a5 + 3;
out[i*4] = b0;
out[i*4 + 1] = b1;
out[i*4 + 2] = b2;
out[i*4 + 3] = b3;
out2[i*4] = b0;
out2[i*4 + 1] = b1;
out2[i*4 + 2] = b4;
out2[i*4 + 3] = b5;
out3[i*4] = b2;
out3[i*4 + 1] = b1;
out3[i*4 + 2] = b4;
out3[i*4 + 3] = b5;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != b0
|| out[i*4 + 1] != b1
|| out[i*4 + 2] != b2
|| out[i*4 + 3] != b3)
abort ();
if (out2[i*4] != b0
|| out2[i*4 + 1] != b1
|| out2[i*4 + 2] != b4
|| out2[i*4 + 3] != b5)
abort ();
if (out3[i*4] != b2
|| out3[i*4 + 1] != b1
|| out3[i*4 + 2] != b4
|| out3[i*4 + 3] != b5)
abort ();
}
for (i = 0; i < N; i++)
{
b0 = a0 + 8;
b1 = a1 + 7;
b2 = a2 + 6;
b3 = a3 + 5;
b4 = a4 + 4;
b5 = a5 + 3;
b6 = a6 + 2;
b7 = a7 + 1;
b8 = a8 + 9;
out[i*4] = b0;
out[i*4 + 1] = b1;
out[i*4 + 2] = b2;
out[i*4 + 3] = b3;
out2[i*8] = b0;
out2[i*8 + 1] = b1;
out2[i*8 + 2] = b4;
out2[i*8 + 3] = b5;
out2[i*8 + 4] = b6;
out2[i*8 + 5] = b2;
out2[i*8 + 6] = b7;
out2[i*8 + 7] = b8;
out3[2*i + 1] = a0;
out3[2*i] = b8;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != b0
|| out[i*4 + 1] != b1
|| out[i*4 + 2] != b2
|| out[i*4 + 3] != b3)
abort ();
if (out2[i*8] != b0
|| out2[i*8 + 1] != b1
|| out2[i*8 + 2] != b4
|| out2[i*8 + 3] != b5
|| out2[i*8 + 4] != b6
|| out2[i*8 + 5] != b2
|| out2[i*8 + 6] != b7
|| out2[i*8 + 7] != b8)
abort ();
if (out3[2*i] != b8
|| out3[2*i+1] != a0)
abort();
}
return 0;
}
int main (void)
{
check_vect ();
main1 (8,7,6,5,4,3,2,1,0);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 128
typedef struct {
int a;
int b;
int c;
int d;
int e;
int f;
int g;
int h;
} s;
int
main1 (s *arr)
{
int i;
s *ptr = arr;
s res[N];
for (i = 0; i < N; i++)
{
res[i].c = ptr->c + ptr->c;
res[i].a = ptr->a + ptr->a;
res[i].d = ptr->d + ptr->d;
res[i].b = ptr->b + ptr->b;
res[i].f = ptr->f + ptr->f;
res[i].e = ptr->e + ptr->e;
res[i].h = ptr->h + ptr->h;
res[i].g = ptr->g + ptr->g;
ptr++;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (res[i].c != arr[i].c + arr[i].c
|| res[i].a != arr[i].a + arr[i].a
|| res[i].d != arr[i].d + arr[i].d
|| res[i].b != arr[i].b + arr[i].b
|| res[i].f != arr[i].f + arr[i].f
|| res[i].e != arr[i].e + arr[i].e
|| res[i].h != arr[i].h + arr[i].h
|| res[i].g != arr[i].g + arr[i].g)
abort();
}
ptr = arr;
for (i = 0; i < N; i++)
{
res[i].c = ptr->c + ptr->c;
res[i].a = ptr->a + ptr->a;
res[i].d = ptr->d + ptr->d;
res[i].b = ptr->b + ptr->b;
res[i].f = ptr->f + ptr->f;
res[i].e = ptr->e + ptr->e;
res[i].h = ptr->e + ptr->e;
res[i].g = ptr->g + ptr->g;
ptr++;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (res[i].c != arr[i].c + arr[i].c
|| res[i].a != arr[i].a + arr[i].a
|| res[i].d != arr[i].d + arr[i].d
|| res[i].b != arr[i].b + arr[i].b
|| res[i].f != arr[i].f + arr[i].f
|| res[i].e != arr[i].e + arr[i].e
|| res[i].h != arr[i].e + arr[i].e
|| res[i].g != arr[i].g + arr[i].g)
abort();
}
}
int main (void)
{
int i;
s arr[N];
check_vect ();
for (i = 0; i < N; i++)
{
arr[i].a = i;
arr[i].b = i * 2;
arr[i].c = 17;
arr[i].d = i+34;
arr[i].e = i * 3 + 5;
arr[i].f = i * 5;
arr[i].g = i - 3;
arr[i].h = 56;
if (arr[i].a == 178)
abort();
}
main1 (arr);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided } && {! { vect_no_align} } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided || vect_no_align} } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 16
#define DIFF 242
typedef struct {
unsigned char a;
unsigned char b;
unsigned char c;
unsigned char d;
} s;
void
main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *arr)
{
int i;
unsigned char ub[N*2] = {1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
unsigned char uc[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
unsigned char udiff = 2;
unsigned char umax = x;
unsigned char umin = x;
unsigned char ua1[N*2];
s *pIn = arr;
s out[N];
for (i = 0; i < N; i++) {
udiff += (unsigned char)(ub[i] - uc[i]);
ua1[2*i+1] = ub[2*i+1];
ua1[2*i] = ub[2*i];
out[i].d = pIn->d - 1;
out[i].b = pIn->b - 4;
out[i].c = pIn->c - 8;
out[i].a = pIn->a - 3;
pIn++;
}
for (i = 0; i < N; i++) {
if (ua1[2*i] != ub[2*i]
|| ua1[2*i+1] != ub[2*i+1]
|| out[i].a != arr[i].a - 3
|| out[i].b != arr[i].b - 4
|| out[i].c != arr[i].c - 8
|| out[i].d != arr[i].d - 1)
abort();
}
/* check results: */
if (udiff != DIFF)
abort ();
}
int main (void)
{
int i;
s arr[N];
for (i = 0; i < N; i++)
{
arr[i].a = i + 9;
arr[i].b = i * 2 + 10;
arr[i].c = 17;
arr[i].d = i+34;
if (arr[i].a == 178)
abort();
}
check_vect ();
main1 (100, 100, 1, arr);
main1 (0, 15, 0, arr);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 128
/* Unaligned stores. */
int main1 (int n)
{
int i;
int ia[N+1];
short sa[N+1];
for (i = 1; i <= N/2; i++)
{
ia[2*i] = 25;
ia[2*i + 1] = 5;
}
/* check results: */
for (i = 1; i <= N/2; i++)
{
if (ia[2*i] != 25
|| ia[2*i + 1] != 5)
abort ();
}
for (i = 1; i <= n/2; i++)
{
sa[2*i] = 25;
sa[2*i + 1] = 5;
}
/* check results: */
for (i = 1; i <= n/2; i++)
{
if (sa[2*i] != 25
|| sa[2*i + 1] != 5)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
return main1 (N);
}
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short out[N*8], a[N], b[N] = {3,6,9,12,15,18,21,24};
/* Partial SLP is not supported. */
for (i = 0; i < N; i++)
{
out[i*4] = in[i*4];
out[i*4 + 1] = in[i*4 + 1];
out[i*4 + 2] = in[i*4 + 2];
out[i*4 + 3] = in[i*4 + 3];
a[i] = b[i] / 3;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*4] != in[i*4]
|| out[i*4 + 1] != in[i*4 + 1]
|| out[i*4 + 2] != in[i*4 + 2]
|| out[i*4 + 3] != in[i*4 + 3]
|| a[i] != b[i] / 3)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 32
int
main1 ()
{
int i;
unsigned short in[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
unsigned short in2[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
unsigned short in3[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
unsigned short check[N] = {0,1,2,3,5,6,7,8,10,11,12,13,15,16,17,18,20,21,22,23,25,26,27,28,30,31,32,33,35,36,37,38};
unsigned short check3[N] = {0,1,2,3,4,5,6,7,8,9,10,11,5,6,7,8,9,10,11,12,13,14,15,16,10,11,12,13,14,15,16,17};
for (i = 0; i < N/4; i++)
{
in[i*4] = in[i*4] + 5;
in[i*4 + 1] = in[i*4 + 1] + 5;
in[i*4 + 2] = in[i*4 + 2] + 5;
in[i*4 + 3] = in[i*4 + 3] + 5;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (in[i] != i+5)
abort ();
}
/* Not vectorizable because of data dependencies. */
for (i = 1; i < N/4; i++)
{
in2[i*4] = in2[(i-1)*4] + 5;
in2[i*4 + 1] = in2[(i-1)*4 + 1] + 5;
in2[i*4 + 2] = in2[(i-1)*4 + 2] + 5;
in2[i*4 + 3] = in2[(i-1)*4 + 3] + 5;
}
/* check results: */
for (i = 4; i < N; i++)
{
if (in2[i] != check[i])
abort ();
}
/* Not vectorizable because of data dependencies: distance 3 is greater than
the actual VF with SLP (2), but the analysis fail to detect that for now. */
for (i = 3; i < N/4; i++)
{
in3[i*4] = in3[(i-3)*4] + 5;
in3[i*4 + 1] = in3[(i-3)*4 + 1] + 5;
in3[i*4 + 2] = in3[(i-3)*4 + 2] + 5;
in3[i*4 + 3] = in3[(i-3)*4 + 3] + 5;
}
/* check results: */
for (i = 12; i < N; i++)
{
if (in3[i] != check3[i])
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned short out[N*8];
unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
for (i = 0; i < N; i++)
{
out[i*8] = in[i*8];
out[i*8 + 1] = in[i*8 + 1];
out[i*8 + 2] = in[i*8 + 2];
out[i*8 + 3] = in[i*8 + 3];
out[i*8 + 4] = in[i*8 + 4];
out[i*8 + 5] = in[i*8 + 5];
out[i*8 + 6] = in[i*8 + 6];
out[i*8 + 7] = in[i*8 + 7];
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != in[i*8]
|| out[i*8 + 1] != in[i*8 + 1]
|| out[i*8 + 2] != in[i*8 + 2]
|| out[i*8 + 3] != in[i*8 + 3]
|| out[i*8 + 4] != in[i*8 + 4]
|| out[i*8 + 5] != in[i*8 + 5]
|| out[i*8 + 6] != in[i*8 + 6]
|| out[i*8 + 7] != in[i*8 + 7])
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = in[i*4];
out[i*4 + 1] = in[i*4 + 1];
out[i*4 + 2] = in[i*4 + 2];
out[i*4 + 3] = in[i*4 + 3];
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != in[i*4]
|| out[i*4 + 1] != in[i*4 + 1]
|| out[i*4 + 2] != in[i*4 + 2]
|| out[i*4 + 3] != in[i*4 + 3])
abort ();
}
for (i = 0; i < N/2; i++)
{
out[i*16] = in[i*16];
out[i*16 + 1] = in[i*16 + 1];
out[i*16 + 2] = in[i*16 + 2];
out[i*16 + 3] = in[i*16 + 3];
out[i*16 + 4] = in[i*16 + 4];
out[i*16 + 5] = in[i*16 + 5];
out[i*16 + 6] = in[i*16 + 6];
out[i*16 + 7] = in[i*16 + 7];
out[i*16 + 8] = in[i*16 + 8];
out[i*16 + 9] = in[i*16 + 9];
out[i*16 + 10] = in[i*16 + 10];
out[i*16 + 11] = in[i*16 + 11];
out[i*16 + 12] = in[i*16 + 12];
out[i*16 + 13] = in[i*16 + 13];
out[i*16 + 14] = in[i*16 + 14];
out[i*16 + 15] = in[i*16 + 15];
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out[i*16] != in[i*16]
|| out[i*16 + 1] != in[i*16 + 1]
|| out[i*16 + 2] != in[i*16 + 2]
|| out[i*16 + 3] != in[i*16 + 3]
|| out[i*16 + 4] != in[i*16 + 4]
|| out[i*16 + 5] != in[i*16 + 5]
|| out[i*16 + 6] != in[i*16 + 6]
|| out[i*16 + 7] != in[i*16 + 7]
|| out[i*16 + 8] != in[i*16 + 8]
|| out[i*16 + 9] != in[i*16 + 9]
|| out[i*16 + 10] != in[i*16 + 10]
|| out[i*16 + 11] != in[i*16 + 11]
|| out[i*16 + 12] != in[i*16 + 12]
|| out[i*16 + 13] != in[i*16 + 13]
|| out[i*16 + 14] != in[i*16 + 14]
|| out[i*16 + 15] != in[i*16 + 15])
abort ();
}
/* SLP with unrolling by 8. */
for (i = 0; i < N/2; i++)
{
out[i*9] = in[i*9];
out[i*9 + 1] = in[i*9 + 1];
out[i*9 + 2] = in[i*9 + 2];
out[i*9 + 3] = in[i*9 + 3];
out[i*9 + 4] = in[i*9 + 4];
out[i*9 + 5] = in[i*9 + 5];
out[i*9 + 6] = in[i*9 + 6];
out[i*9 + 7] = in[i*9 + 7];
out[i*9 + 8] = in[i*9 + 8];
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out[i*9] != in[i*9]
|| out[i*9 + 1] != in[i*9 + 1]
|| out[i*9 + 2] != in[i*9 + 2]
|| out[i*9 + 3] != in[i*9 + 3]
|| out[i*9 + 4] != in[i*9 + 4]
|| out[i*9 + 5] != in[i*9 + 5]
|| out[i*9 + 6] != in[i*9 + 6]
|| out[i*9 + 7] != in[i*9 + 7]
|| out[i*9 + 8] != in[i*9 + 8])
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
float out2[N*8];
/* SLP with unrolling by 4. */
for (i = 0; i < N; i++)
{
a0 = in[i*7] + 5;
a1 = in[i*7 + 1] + 6;
a2 = in[i*7 + 2] + 7;
a3 = in[i*7 + 3] + 8;
a4 = in[i*7 + 4] + 9;
a5 = in[i*7 + 5] + 10;
a6 = in[i*7 + 6] + 11;
b0 = a0 * 3;
b1 = a1 * 2;
b2 = a2 * 12;
b3 = a3 * 5;
b4 = a4 * 8;
b5 = a5 * 4;
b6 = a6 * 3;
out[i*7] = b0 - 2;
out[i*7 + 1] = b1 - 3;
out[i*7 + 2] = b2 - 2;
out[i*7 + 3] = b3 - 1;
out[i*7 + 4] = b4 - 8;
out[i*7 + 5] = b5 - 7;
out[i*7 + 6] = b6 - 3;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*7] != (in[i*7] + 5) * 3 - 2
|| out[i*7 + 1] != (in[i*7 + 1] + 6) * 2 - 3
|| out[i*7 + 2] != (in[i*7 + 2] + 7) * 12 - 2
|| out[i*7 + 3] != (in[i*7 + 3] + 8) * 5 - 1
|| out[i*7 + 4] != (in[i*7 + 4] + 9) * 8 - 8
|| out[i*7 + 5] != (in[i*7 + 5] + 10) * 4 - 7
|| out[i*7 + 6] != (in[i*7 + 6] + 11) * 3 - 3)
abort ();
}
/* SLP with unrolling by 4. */
for (i = 0; i < N*2; i++)
{
out[i*3] = (in[i*3] + 2) * 3;
out[i*3 + 1] = (in[i*3 + 1] + 2) * 7;
out[i*3 + 2] = (in[i*3 + 2] + 7) * 3;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*3] != (in[i*3] + 2) * 3
|| out[i*3 + 1] != (in[i*3 + 1] + 2) * 7
|| out[i*3 + 2] != (in[i*3 + 2] + 7) * 3)
abort ();
}
/* SLP with unrolling by 4. */
for (i = 0; i < N*2; i++)
{
out2[i*3] = (float) (in[i*3] * 2 + 5) ;
out2[i*3 + 1] = (float) (in[i*3 + 1] * 3 + 7);
out2[i*3 + 2] = (float) (in[i*3 + 2] * 5 + 4);
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out2[i*3] != (float) (in[i*3] * 2 + 5)
|| out2[i*3 + 1] != (float) (in[i*3 + 1] * 3 + 7)
|| out2[i*3 + 2] != (float) (in[i*3 + 2] * 5 + 4))
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target {vect_intfloat_cvt && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target {vect_intfloat_cvt && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned short out[N*8];
unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short out2[N*8];
/* SLP with unrolling by 8. */
for (i = 0; i < N; i++)
{
out[i*3] = in[i*3] + 5;
out[i*3 + 1] = in[i*3 + 1] + 6;
out[i*3 + 2] = in[i*3 + 2] + 16;
out2[i*5] = in2[i*5] + 2;
out2[i*5 + 1] = in2[i*5 + 1] + 2;
out2[i*5 + 2] = in2[i*5 + 2] + 1;
out2[i*5 + 3] = in2[i*5 + 3] + 3;
out2[i*5 + 4] = in2[i*5 + 4] + 13;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*3] != in[i*3] + 5
|| out[i*3 + 1] != in[i*3 + 1] + 6
|| out[i*3 + 2] != in[i*3 + 2] + 16
|| out2[i*5] != in2[i*5] + 2
|| out2[i*5 + 1] != in2[i*5 + 1] + 2
|| out2[i*5 + 2] != in2[i*5 + 2] + 1
|| out2[i*5 + 3] != in2[i*5 + 3] + 3
|| out2[i*5 + 4] != in2[i*5 + 4] + 13)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 128
typedef struct {
int a;
int b;
int c;
int d;
int e;
} s;
int
main1 (s *arr)
{
int i;
s *ptr = arr;
s res[N];
/* SLP with unrolling by 4. */
for (i = 0; i < N; i++)
{
res[i].c = ptr->c + ptr->c;
res[i].a = ptr->a + ptr->a;
res[i].d = ptr->d + ptr->d;
res[i].b = ptr->b + ptr->b;
res[i].e = ptr->e + ptr->e;
ptr++;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (res[i].c != arr[i].c + arr[i].c
|| res[i].a != arr[i].a + arr[i].a
|| res[i].d != arr[i].d + arr[i].d
|| res[i].b != arr[i].b + arr[i].b
|| res[i].e != arr[i].e + arr[i].e)
abort();
}
}
int main (void)
{
int i;
s arr[N];
check_vect ();
for (i = 0; i < N; i++)
{
arr[i].a = i;
arr[i].b = i * 2;
arr[i].c = 17;
arr[i].d = i+34;
arr[i].e = i * 3 + 5;
if (arr[i].a == 178)
abort();
}
main1 (arr);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-do compile } */
/* { dg-require-effective-target vect_shift } */
#define N 32
/* All the loops are vectorizable on platforms with vector shift argument. */
void
test_1 (void)
{
static unsigned int bm[N];
static unsigned int cm[N];
int j;
/* Vectorizable on platforms with scalar shift argument. */
for (j = 0; j < N/2; j++)
{
bm[2*j] <<= 8;
bm[2*j+1] <<= 8;
}
/* Not vectorizable on platforms with scalar shift argument. */
for (j = 0; j < N/2; j++)
{
cm[2*j] <<= 8;
cm[2*j+1] <<= 7;
}
}
void
test_2 (int a, int b)
{
static unsigned int bm[N];
static unsigned int cm[N];
int j;
/* Vectorizable on platforms with scalar shift argument. */
for (j = 0; j < N/2; j++)
{
bm[2*j] <<= a;
bm[2*j+1] <<= a;
}
/* Not vectorizable on platforms with scalar shift argument. */
for (j = 0; j < N/2; j++)
{
cm[2*j] <<= a;
cm[2*j+1] <<= b;
}
}
void
test_3 (void)
{
static unsigned int bm[N];
int am[N];
int j;
/* Not vectorizable on platforms with scalar shift argument. */
for (j = 0; j < N/2; j++)
{
bm[2*j] <<= am[j];
bm[2*j+1] <<= am[j];
}
/* Not vectorizable on platforms with scalar shift argument. */
for (j = 0; j < N/2; j++)
{
bm[2*j] <<= am[2*j];
bm[2*j+1] <<= am[2*j+1];
}
}
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdlib.h>
#include "tree-vect.h"
#define N 128
typedef struct {
int a;
int b;
void *c;
} s1;
int
foo1 (s1 *arr)
{
int i;
s1 *ptr = arr;
/* Different constant types - not SLPable. The group size is not power of 2,
interleaving is not supported either. */
for (i = 0; i < N; i++)
{
ptr->a = 6;
ptr->b = 7;
ptr->c = NULL;
ptr++;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (arr[i].a != 6
|| arr[i].b != 7
|| arr[i].c != NULL)
abort();
}
}
int main (void)
{
int i;
s1 arr1[N];
check_vect ();
for (i = 0; i < N; i++)
{
arr1[i].a = i;
arr1[i].b = i * 2;
arr1[i].c = (void *)arr1;
if (arr1[i].a == 178)
abort();
}
foo1 (arr1);
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 16
int
main1 ()
{
int i;
unsigned short out[N*8];
unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int ia[N*2];
for (i = 0; i < N; i++)
{
out[i*8] = in[i*8];
out[i*8 + 1] = in[i*8 + 1];
out[i*8 + 2] = in[i*8 + 2];
out[i*8 + 3] = in[i*8 + 3];
out[i*8 + 4] = in[i*8 + 4];
out[i*8 + 5] = in[i*8 + 5];
out[i*8 + 6] = in[i*8 + 6];
out[i*8 + 7] = in[i*8 + 7];
ia[i] = 7;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != in[i*8]
|| out[i*8 + 1] != in[i*8 + 1]
|| out[i*8 + 2] != in[i*8 + 2]
|| out[i*8 + 3] != in[i*8 + 3]
|| out[i*8 + 4] != in[i*8 + 4]
|| out[i*8 + 5] != in[i*8 + 5]
|| out[i*8 + 6] != in[i*8 + 6]
|| out[i*8 + 7] != in[i*8 + 7]
|| ia[i] != 7)
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = in[i*4];
out[i*4 + 1] = in[i*4 + 1];
out[i*4 + 2] = in[i*4 + 2];
out[i*4 + 3] = in[i*4 + 3];
ia[i] = 12;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != in[i*4]
|| out[i*4 + 1] != in[i*4 + 1]
|| out[i*4 + 2] != in[i*4 + 2]
|| out[i*4 + 3] != in[i*4 + 3]
|| ia[i] != 12)
abort ();
}
for (i = 0; i < N/2; i++)
{
out[i*16] = in[i*16];
out[i*16 + 1] = in[i*16 + 1];
out[i*16 + 2] = in[i*16 + 2];
out[i*16 + 3] = in[i*16 + 3];
out[i*16 + 4] = in[i*16 + 4];
out[i*16 + 5] = in[i*16 + 5];
out[i*16 + 6] = in[i*16 + 6];
out[i*16 + 7] = in[i*16 + 7];
out[i*16 + 8] = in[i*16 + 8];
out[i*16 + 9] = in[i*16 + 9];
out[i*16 + 10] = in[i*16 + 10];
out[i*16 + 11] = in[i*16 + 11];
out[i*16 + 12] = in[i*16 + 12];
out[i*16 + 13] = in[i*16 + 13];
out[i*16 + 14] = in[i*16 + 14];
out[i*16 + 15] = in[i*16 + 15];
ia[i] = 21;
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out[i*16] != in[i*16]
|| out[i*16 + 1] != in[i*16 + 1]
|| out[i*16 + 2] != in[i*16 + 2]
|| out[i*16 + 3] != in[i*16 + 3]
|| out[i*16 + 4] != in[i*16 + 4]
|| out[i*16 + 5] != in[i*16 + 5]
|| out[i*16 + 6] != in[i*16 + 6]
|| out[i*16 + 7] != in[i*16 + 7]
|| out[i*16 + 8] != in[i*16 + 8]
|| out[i*16 + 9] != in[i*16 + 9]
|| out[i*16 + 10] != in[i*16 + 10]
|| out[i*16 + 11] != in[i*16 + 11]
|| out[i*16 + 12] != in[i*16 + 12]
|| out[i*16 + 13] != in[i*16 + 13]
|| out[i*16 + 14] != in[i*16 + 14]
|| out[i*16 + 15] != in[i*16 + 15]
|| ia[i] != 21)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 16
int
main1 ()
{
int i;
unsigned int out[N*8];
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short ia[N];
unsigned int ib[N*2];
/* Not SLPable for now: multiple types with SLP of the smaller type. */
for (i = 0; i < N; i++)
{
out[i*8] = in[i*8];
out[i*8 + 1] = in[i*8 + 1];
out[i*8 + 2] = in[i*8 + 2];
out[i*8 + 3] = in[i*8 + 3];
out[i*8 + 4] = in[i*8 + 4];
out[i*8 + 5] = in[i*8 + 5];
out[i*8 + 6] = in[i*8 + 6];
out[i*8 + 7] = in[i*8 + 7];
ia[i] = 7;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != in[i*8]
|| out[i*8 + 1] != in[i*8 + 1]
|| out[i*8 + 2] != in[i*8 + 2]
|| out[i*8 + 3] != in[i*8 + 3]
|| out[i*8 + 4] != in[i*8 + 4]
|| out[i*8 + 5] != in[i*8 + 5]
|| out[i*8 + 6] != in[i*8 + 6]
|| out[i*8 + 7] != in[i*8 + 7]
|| ia[i] != 7)
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = in[i*4];
out[i*4 + 1] = in[i*4 + 1];
out[i*4 + 2] = in[i*4 + 2];
out[i*4 + 3] = in[i*4 + 3];
ib[i] = 12;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != in[i*4]
|| out[i*4 + 1] != in[i*4 + 1]
|| out[i*4 + 2] != in[i*4 + 2]
|| out[i*4 + 3] != in[i*4 + 3]
|| ib[i] != 12)
abort ();
}
for (i = 0; i < N/2; i++)
{
out[i*16] = in[i*16];
out[i*16 + 1] = in[i*16 + 1];
out[i*16 + 2] = in[i*16 + 2];
out[i*16 + 3] = in[i*16 + 3];
out[i*16 + 4] = in[i*16 + 4];
out[i*16 + 5] = in[i*16 + 5];
out[i*16 + 6] = in[i*16 + 6];
out[i*16 + 7] = in[i*16 + 7];
out[i*16 + 8] = in[i*16 + 8];
out[i*16 + 9] = in[i*16 + 9];
out[i*16 + 10] = in[i*16 + 10];
out[i*16 + 11] = in[i*16 + 11];
out[i*16 + 12] = in[i*16 + 12];
out[i*16 + 13] = in[i*16 + 13];
out[i*16 + 14] = in[i*16 + 14];
out[i*16 + 15] = in[i*16 + 15];
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out[i*16] != in[i*16]
|| out[i*16 + 1] != in[i*16 + 1]
|| out[i*16 + 2] != in[i*16 + 2]
|| out[i*16 + 3] != in[i*16 + 3]
|| out[i*16 + 4] != in[i*16 + 4]
|| out[i*16 + 5] != in[i*16 + 5]
|| out[i*16 + 6] != in[i*16 + 6]
|| out[i*16 + 7] != in[i*16 + 7]
|| out[i*16 + 8] != in[i*16 + 8]
|| out[i*16 + 9] != in[i*16 + 9]
|| out[i*16 + 10] != in[i*16 + 10]
|| out[i*16 + 11] != in[i*16 + 11]
|| out[i*16 + 12] != in[i*16 + 12]
|| out[i*16 + 13] != in[i*16 + 13]
|| out[i*16 + 14] != in[i*16 + 14]
|| out[i*16 + 15] != in[i*16 + 15])
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { vect_strided } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { ! { vect_strided } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned short out[N*8];
unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned int out2[N*8];
for (i = 0; i < N; i++)
{
out[i*8] = in[i*8] + 5;
out[i*8 + 1] = in[i*8 + 1] + 6;
out[i*8 + 2] = in[i*8 + 2] + 7;
out[i*8 + 3] = in[i*8 + 3] + 8;
out[i*8 + 4] = in[i*8 + 4] + 9;
out[i*8 + 5] = in[i*8 + 5] + 10;
out[i*8 + 6] = in[i*8 + 6] + 11;
out[i*8 + 7] = in[i*8 + 7] + 12;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != in[i*8] + 5
|| out[i*8 + 1] != in[i*8 + 1] + 6
|| out[i*8 + 2] != in[i*8 + 2] + 7
|| out[i*8 + 3] != in[i*8 + 3] + 8
|| out[i*8 + 4] != in[i*8 + 4] + 9
|| out[i*8 + 5] != in[i*8 + 5] + 10
|| out[i*8 + 6] != in[i*8 + 6] + 11
|| out[i*8 + 7] != in[i*8 + 7] + 12)
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = in[i*4] + 2;
out[i*4 + 1] = in[i*4 + 1] + 2;
out[i*4 + 2] = in[i*4 + 2] + 1;
out[i*4 + 3] = in[i*4 + 3] + 3;
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != in[i*4] + 2
|| out[i*4 + 1] != in[i*4 + 1] + 2
|| out[i*4 + 2] != in[i*4 + 2] + 1
|| out[i*4 + 3] != in[i*4 + 3] + 3)
abort ();
}
for (i = 0; i < N/2; i++)
{
out2[i*16] = in2[i*16] * 2;
out2[i*16 + 1] = in2[i*16 + 1] * 3;
out2[i*16 + 2] = in2[i*16 + 2] * 4;
out2[i*16 + 3] = in2[i*16 + 3] * 3;
out2[i*16 + 4] = in2[i*16 + 4] * 2;
out2[i*16 + 5] = in2[i*16 + 5] * 3;
out2[i*16 + 6] = in2[i*16 + 6] * 2;
out2[i*16 + 7] = in2[i*16 + 7] * 4;
out2[i*16 + 8] = in2[i*16 + 8] * 2;
out2[i*16 + 9] = in2[i*16 + 9] * 5;
out2[i*16 + 10] = in2[i*16 + 10] * 2;
out2[i*16 + 11] = in2[i*16 + 11] * 3;
out2[i*16 + 12] = in2[i*16 + 12] * 4;
out2[i*16 + 13] = in2[i*16 + 13] * 4;
out2[i*16 + 14] = in2[i*16 + 14] * 3;
out2[i*16 + 15] = in2[i*16 + 15] * 2;
}
/* check results: */
for (i = 0; i < N/2; i++)
{
if (out2[i*16] != in2[i*16] * 2
|| out2[i*16 + 1] != in2[i*16 + 1] * 3
|| out2[i*16 + 2] != in2[i*16 + 2] * 4
|| out2[i*16 + 3] != in2[i*16 + 3] * 3
|| out2[i*16 + 4] != in2[i*16 + 4] * 2
|| out2[i*16 + 5] != in2[i*16 + 5] * 3
|| out2[i*16 + 6] != in2[i*16 + 6] * 2
|| out2[i*16 + 7] != in2[i*16 + 7] * 4
|| out2[i*16 + 8] != in2[i*16 + 8] * 2
|| out2[i*16 + 9] != in2[i*16 + 9] * 5
|| out2[i*16 + 10] != in2[i*16 + 10] * 2
|| out2[i*16 + 11] != in2[i*16 + 11] * 3
|| out2[i*16 + 12] != in2[i*16 + 12] * 4
|| out2[i*16 + 13] != in2[i*16 + 13] * 4
|| out2[i*16 + 14] != in2[i*16 + 14] * 3
|| out2[i*16 + 15] != in2[i*16 + 15] * 2)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target vect_int_mult} } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target { ! { vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { ! { vect_int_mult } } } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include <stdio.h>
#include "tree-vect.h"
#define N 8
int
main1 ()
{
int i;
unsigned int out[N*8], ia[N*2];
unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
unsigned short sa[N], out2[N*16];
for (i = 0; i < N; i++)
{
out[i*8] = in[i*8] + 5;
out[i*8 + 1] = in[i*8 + 1] + 6;
out[i*8 + 2] = in[i*8 + 2] + 7;
out[i*8 + 3] = in[i*8 + 3] + 8;
out[i*8 + 4] = in[i*8 + 4] + 9;
out[i*8 + 5] = in[i*8 + 5] + 10;
out[i*8 + 6] = in[i*8 + 6] + 11;
out[i*8 + 7] = in[i*8 + 7] + 12;
ia[i] = in[i];
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out[i*8] != in[i*8] + 5
|| out[i*8 + 1] != in[i*8 + 1] + 6
|| out[i*8 + 2] != in[i*8 + 2] + 7
|| out[i*8 + 3] != in[i*8 + 3] + 8
|| out[i*8 + 4] != in[i*8 + 4] + 9
|| out[i*8 + 5] != in[i*8 + 5] + 10
|| out[i*8 + 6] != in[i*8 + 6] + 11
|| out[i*8 + 7] != in[i*8 + 7] + 12
|| ia[i] != in[i])
abort ();
}
for (i = 0; i < N*2; i++)
{
out[i*4] = in[i*4] + 1;
out[i*4 + 1] = in[i*4 + 1] + 2;
out[i*4 + 2] = in[i*4 + 2] + 3;
out[i*4 + 3] = in[i*4 + 3] + 4;
ia[i] = in[i];
}
/* check results: */
for (i = 0; i < N*2; i++)
{
if (out[i*4] != in[i*4] + 1
|| out[i*4 + 1] != in[i*4 + 1] + 2
|| out[i*4 + 2] != in[i*4 + 2] + 3
|| out[i*4 + 3] != in[i*4 + 3] + 4
|| ia[i] != in[i])
abort ();
}
for (i = 0; i < N; i++)
{
out2[i*16] = in2[i*16] * 2;
out2[i*16 + 1] = in2[i*16 + 1] * 3;
out2[i*16 + 2] = in2[i*16 + 2] * 4;
out2[i*16 + 3] = in2[i*16 + 3] * 3;
out2[i*16 + 4] = in2[i*16 + 4] * 2;
out2[i*16 + 5] = in2[i*16 + 5] * 3;
out2[i*16 + 6] = in2[i*16 + 6] * 2;
out2[i*16 + 7] = in2[i*16 + 7] * 4;
out2[i*16 + 8] = in2[i*16 + 8] * 2;
out2[i*16 + 9] = in2[i*16 + 9] * 5;
out2[i*16 + 10] = in2[i*16 + 10] * 2;
out2[i*16 + 11] = in2[i*16 + 11] * 3;
out2[i*16 + 12] = in2[i*16 + 12] * 4;
out2[i*16 + 13] = in2[i*16 + 13] * 4;
out2[i*16 + 14] = in2[i*16 + 14] * 3;
out2[i*16 + 15] = in2[i*16 + 15] * 2;
}
/* check results: */
for (i = 0; i < N; i++)
{
if (out2[i*16] != in2[i*16] * 2
|| out2[i*16 + 1] != in2[i*16 + 1] * 3
|| out2[i*16 + 2] != in2[i*16 + 2] * 4
|| out2[i*16 + 3] != in2[i*16 + 3] * 3
|| out2[i*16 + 4] != in2[i*16 + 4] * 2
|| out2[i*16 + 5] != in2[i*16 + 5] * 3
|| out2[i*16 + 6] != in2[i*16 + 6] * 2
|| out2[i*16 + 7] != in2[i*16 + 7] * 4
|| out2[i*16 + 8] != in2[i*16 + 8] * 2
|| out2[i*16 + 9] != in2[i*16 + 9] * 5
|| out2[i*16 + 10] != in2[i*16 + 10] * 2
|| out2[i*16 + 11] != in2[i*16 + 11] * 3
|| out2[i*16 + 12] != in2[i*16 + 12] * 4
|| out2[i*16 + 13] != in2[i*16 + 13] * 4
|| out2[i*16 + 14] != in2[i*16 + 14] * 3
|| out2[i*16 + 15] != in2[i*16 + 15] * 2)
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
main1 ();
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target { vect_strided && vect_int_mult } } } }*/
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target { ! { vect_strided && vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 32
int main1 ()
{
int i;
int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
float fa[N];
/* int -> float */
for (i = 0; i < N/4; i++)
{
fa[4*i] = (float) ib[4*i];
fa[4*i + 1] = (float) ib[4*i + 1];
fa[4*i + 2] = (float) ib[4*i + 2];
fa[4*i + 3] = (float) ib[4*i + 3];
}
/* check results: */
for (i = 0; i < N/4; i++)
{
if (fa[4*i] != (float) ib[4*i]
|| fa[4*i + 1] != (float) ib[4*i + 1]
|| fa[4*i + 2] != (float) ib[4*i + 2]
|| fa[4*i + 3] != (float) ib[4*i + 3])
abort ();
}
return 0;
}
int main (void)
{
check_vect ();
return main1 ();
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target powerpc*-*-* i?86-*-* x86_64-*-* } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target powerpc*-*-* i?86-*-* x86_64-*-* } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 64
short X[N] __attribute__ ((__aligned__(16)));
short Y[N] __attribute__ ((__aligned__(16)));
int result[N];
/* short->int widening-mult */
int
foo1(int len) {
int i;
for (i=0; i<len/2; i++) {
result[2*i] = X[2*i] * Y[2*i];
result[2*i+1] = X[2*i+1] * Y[2*i+1];
}
}
int main (void)
{
int i;
check_vect ();
for (i=0; i<N; i++) {
X[i] = i;
Y[i] = 64-i;
}
foo1 (N);
for (i=0; i<N; i++) {
if (result[i] != X[i] * Y[i])
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_widen_mult_hi_to_si } } } }*/
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
......@@ -10,9 +10,9 @@ struct S
unsigned short b;
};
struct S result[N] = {12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18,
18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24,
24, 25, 25, 26, 26, 27, 27, 28};
struct S result[N] = {20, 13, 22, 14, 24, 15, 26, 16, 28, 17, 30, 18,
32, 19, 34, 20, 36, 21, 38, 22, 40, 23, 42, 24,
44, 25, 46, 26, 48, 27, 50, 28};
struct S X[N] = {10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16,
16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22,
23, 23, 24, 24, 25, 25};
......@@ -25,7 +25,7 @@ foo (struct S * in, struct S * out)
for (i = 0; i < N; i++)
{
out[i].a = in[i].a + 2;
out[i].a = in[i].a * 2;
out[i].b = in[i].b + 3;
}
}
......@@ -42,10 +42,10 @@ main (void)
/* check results: */
for (i = 0; i < N; i++)
{
if (Y[i].a != result[i].a)
if (Y[i].a != result[i].a)
abort ();
if (Y[i].b != result[i].b)
if (Y[i].b != result[i].b)
abort ();
}
......
/* { dg-require-effective-target vect_int } */
#include <stdarg.h>
#include "tree-vect.h"
#define N 16
struct S
{
unsigned short a;
unsigned short b;
};
struct S result[N] = {12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18,
18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24,
24, 25, 25, 26, 26, 27, 27, 28};
struct S X[N] = {10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16,
16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22,
23, 23, 24, 24, 25, 25};
struct S Y[N] = {};
__attribute__ ((noinline)) void
foo (struct S * in, struct S * out)
{
int i;
for (i = 0; i < N; i++)
{
out[i].a = in[i].a + 2;
out[i].b = in[i].b + 3;
}
}
int
main (void)
{
int i;
check_vect ();
foo (X, Y);
/* check results: */
for (i = 0; i < N; i++)
{
if (Y[i].a != result[i].a)
abort ();
if (Y[i].b != result[i].b)
abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
......@@ -108,6 +108,8 @@ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/pr*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vect-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/slp-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
#### Tests with special options
global SAVED_DEFAULT_VECTCFLAGS
......@@ -122,25 +124,25 @@ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-vfa-*.\[cS\]]] \
# -ffast-math tests
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-ffast-math"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-vect*.\[cS\]]] \
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -fno-math-errno tests
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-math-errno"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-math-errno-vect*.\[cS\]]] \
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-math-errno-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -fwrapv tests
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fwrapv"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/wrapv-vect*.\[cS\]]] \
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/wrapv-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -ftrapv tests
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-ftrapv"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/trapv-vect*.\[cS\]]] \
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/trapv-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -fdump-tree-dceloop-details tests
......@@ -197,12 +199,24 @@ lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop" "-fno-tree-reassoc"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-noreassoc-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -fno-tree-scev-cprop
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-slp-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -fno-tree-dominator-opts
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-dominator-opts"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-dom-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# -fno-tree-pre
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-pre"
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-pre-*.\[cS\]]] \
"" $DEFAULT_VECTCFLAGS
# With -Os
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-Os"
......
......@@ -2043,7 +2043,7 @@ proc check_effective_target_vect_no_align { } {
set et_vect_no_align_saved 0
if { [istarget mipsisa64*-*-*]
|| [istarget sparc*-*-*]
|| [istarget ia64-*-*] } {
|| [istarget ia64-*-*] } {
set et_vect_no_align_saved 1
}
}
......@@ -2255,6 +2255,24 @@ proc check_effective_target_vect_interleave { } {
return $et_vect_interleave_saved
}
# Return 1 if the target supports vector interleaving and extract even/odd, 0 otherwise.
proc check_effective_target_vect_strided { } {
global et_vect_strided_saved
if [info exists et_vect_strided_saved] {
verbose "check_effective_target_vect_strided: using cached result" 2
} else {
set et_vect_strided_saved 0
if { [check_effective_target_vect_interleave]
&& [check_effective_target_vect_extract_even_odd] } {
set et_vect_strided_saved 1
}
}
verbose "check_effective_target_vect_strided: returning $et_vect_strided_saved" 2
return $et_vect_strided_saved
}
# Return 1 if the target supports section-anchors
proc check_effective_target_section_anchors { } {
......
......@@ -1359,6 +1359,7 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo)
STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
STMT_SLP_TYPE (res) = 0;
DR_GROUP_FIRST_DR (res) = NULL_TREE;
DR_GROUP_NEXT_DR (res) = NULL_TREE;
DR_GROUP_SIZE (res) = 0;
......@@ -1478,7 +1479,9 @@ new_loop_vec_info (struct loop *loop)
VEC_alloc (tree, heap, PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS));
LOOP_VINFO_MAY_ALIAS_DDRS (res) =
VEC_alloc (ddr_p, heap, PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (tree, heap, 10);
LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
return res;
}
......@@ -1497,6 +1500,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts)
int nbbs;
block_stmt_iterator si;
int j;
VEC (slp_instance, heap) *slp_instances;
slp_instance instance;
if (!loop_vinfo)
return;
......@@ -1571,6 +1576,10 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts)
free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
VEC_free (tree, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
VEC_free (ddr_p, heap, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
for (j = 0; VEC_iterate (slp_instance, slp_instances, j, instance); j++)
vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
VEC_free (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
free (loop_vinfo);
loop->aux = NULL;
......
......@@ -60,7 +60,7 @@ enum dr_alignment_support {
/* Define type of def-use cross-iteration cycle. */
enum vect_def_type {
vect_constant_def,
vect_constant_def = 1,
vect_invariant_def,
vect_loop_def,
vect_induction_def,
......@@ -77,11 +77,80 @@ enum verbosity_levels {
REPORT_DR_DETAILS,
REPORT_BAD_FORM_LOOPS,
REPORT_OUTER_LOOPS,
REPORT_SLP,
REPORT_DETAILS,
/* New verbosity levels should be added before this one. */
MAX_VERBOSITY_LEVEL
};
/************************************************************************
SLP
************************************************************************/
/* A computation tree of an SLP instance. Each node corresponds to a group of
stmts to be packed in a SIMD stmt. */
typedef struct _slp_tree {
/* Only binary and unary operations are supported. LEFT child corresponds to
the first operand and RIGHT child to the second if the operation is
binary. */
struct _slp_tree *left;
struct _slp_tree *right;
/* A group of scalar stmts to be vectorized together. */
VEC (tree, heap) *stmts;
/* Vectorized stmt/s. */
VEC (tree, heap) *vec_stmts;
/* Number of vector stmts that are created to replace the group of scalar
stmts. It is calculated during the transformation phase as the number of
scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
divided by vector size. */
unsigned int vec_stmts_size;
/* Vectorization costs associated with SLP node. */
struct
{
int outside_of_loop; /* Statements generated outside loop. */
int inside_of_loop; /* Statements generated inside loop. */
} cost;
} *slp_tree;
/* SLP instance is a sequence of stmts in a loop that can be packed into
SIMD stmts. */
typedef struct _slp_instance {
/* The root of SLP tree. */
slp_tree root;
/* Size of groups of scalar stmts that will be replaced by SIMD stmt/s. */
unsigned int group_size;
/* The unrolling factor required to vectorized this SLP instance. */
unsigned int unrolling_factor;
/* Vectorization costs associated with SLP instance. */
struct
{
int outside_of_loop; /* Statements generated outside loop. */
int inside_of_loop; /* Statements generated inside loop. */
} cost;
} *slp_instance;
DEF_VEC_P(slp_instance);
DEF_VEC_ALLOC_P(slp_instance, heap);
/* Access Functions. */
#define SLP_INSTANCE_TREE(S) (S)->root
#define SLP_INSTANCE_GROUP_SIZE(S) (S)->group_size
#define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor
#define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
#define SLP_INSTANCE_INSIDE_OF_LOOP_COST(S) (S)->cost.inside_of_loop
#define SLP_TREE_LEFT(S) (S)->left
#define SLP_TREE_RIGHT(S) (S)->right
#define SLP_TREE_SCALAR_STMTS(S) (S)->stmts
#define SLP_TREE_VEC_STMTS(S) (S)->vec_stmts
#define SLP_TREE_NUMBER_OF_VEC_STMTS(S) (S)->vec_stmts_size
#define SLP_TREE_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
#define SLP_TREE_INSIDE_OF_LOOP_COST(S) (S)->cost.inside_of_loop
/*-----------------------------------------------------------------*/
/* Info on vectorized loops. */
/*-----------------------------------------------------------------*/
......@@ -141,6 +210,18 @@ typedef struct _loop_vec_info {
/* The loop location in the source. */
LOC loop_line_number;
/* All interleaving chains of stores in the loop, represented by the first
stmt in the chain. */
VEC(tree, heap) *strided_stores;
/* All SLP instances in the loop. This is a subset of the set of STRIDED_STORES
of the loop. */
VEC(slp_instance, heap) *slp_instances;
/* The unrolling factor needed to SLP the loop. In case of that pure SLP is
applied to the loop, i.e., no unrolling is needed, this is 1. */
unsigned slp_unrolling_factor;
} *loop_vec_info;
/* Access Functions. */
......@@ -159,6 +240,9 @@ typedef struct _loop_vec_info {
#define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts
#define LOOP_VINFO_LOC(L) (L)->loop_line_number
#define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs
#define LOOP_VINFO_STRIDED_STORES(L) (L)->strided_stores
#define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances
#define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
#define NITERS_KNOWN_P(n) \
(host_integerp ((n),0) \
......@@ -216,6 +300,29 @@ enum vect_relevant {
vect_used_in_loop
};
/* The type of vectorization that can be applied to the stmt: regular loop-based
vectorization; pure SLP - the stmt is a part of SLP instances and does not
have uses outside SLP instances; or hybrid SLP and loop-based - the stmt is
a part of SLP instance and also must be loop-based vectorized, since it has
uses outside SLP sequences.
In the loop context the meanings of pure and hybrid SLP are slightly
different. By saying that pure SLP is applied to the loop, we mean that we
exploit only intra-iteration parallelism in the loop; i.e., the loop can be
vectorized without doing any conceptual unrolling, cause we don't pack
together stmts from different iterations, only within a single iteration.
Loop hybrid SLP means that we exploit both intra-iteration and
inter-iteration parallelism (e.g., number of elements in the vector is 4
and the slp-group-size is 2, in which case we don't have enough parallelism
within an iteration, so we obtain the rest of the parallelism from subsequent
iterations by unrolling the loop by 2). */
enum slp_vect_type {
loop_vect = 0,
pure_slp,
hybrid
};
typedef struct data_reference *dr_p;
DEF_VEC_P(dr_p);
DEF_VEC_ALLOC_P(dr_p,heap);
......@@ -309,6 +416,9 @@ typedef struct _stmt_vec_info {
int outside_of_loop; /* Statements generated outside loop. */
int inside_of_loop; /* Statements generated inside loop. */
} cost;
/* Whether the stmt is SLPed, loop-based vectorized, or both. */
enum slp_vect_type slp_type;
} *stmt_vec_info;
/* Access Functions. */
......@@ -338,6 +448,7 @@ typedef struct _stmt_vec_info {
#define STMT_VINFO_DR_GROUP_GAP(S) (S)->gap
#define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)->same_dr_stmt
#define STMT_VINFO_DR_GROUP_READ_WRITE_DEPENDENCE(S) (S)->read_write_dep
#define STMT_VINFO_STRIDED_ACCESS(S) ((S)->first_dr != NULL)
#define DR_GROUP_FIRST_DR(S) (S)->first_dr
#define DR_GROUP_NEXT_DR(S) (S)->next_dr
......@@ -351,6 +462,10 @@ typedef struct _stmt_vec_info {
#define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
#define STMT_VINFO_INSIDE_OF_LOOP_COST(S) (S)->cost.inside_of_loop
#define HYBRID_SLP_STMT(S) ((S)->slp_type == hybrid)
#define PURE_SLP_STMT(S) ((S)->slp_type == pure_slp)
#define STMT_SLP_TYPE(S) (S)->slp_type
/* These are some defines for the initial implementation of the vectorizer's
cost model. These will later be target specific hooks. */
......@@ -524,6 +639,7 @@ extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
/** In tree-vect-analyze.c **/
/* Driver for analysis stage. */
extern loop_vec_info vect_analyze_loop (struct loop *);
extern void vect_free_slp_tree (slp_tree);
/** In tree-vect-patterns.c **/
......@@ -536,14 +652,16 @@ void vect_pattern_recog (loop_vec_info);
/** In tree-vect-transform.c **/
extern bool vectorizable_load (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_store (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_operation (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_load (tree, block_stmt_iterator *, tree *, slp_tree);
extern bool vectorizable_store (tree, block_stmt_iterator *, tree *, slp_tree);
extern bool vectorizable_operation (tree, block_stmt_iterator *, tree *,
slp_tree);
extern bool vectorizable_type_promotion (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_type_demotion (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_conversion (tree, block_stmt_iterator *,
tree *);
extern bool vectorizable_assignment (tree, block_stmt_iterator *, tree *);
tree *, slp_tree);
extern bool vectorizable_assignment (tree, block_stmt_iterator *, tree *,
slp_tree);
extern tree vectorizable_function (tree, tree, tree);
extern bool vectorizable_call (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_condition (tree, block_stmt_iterator *, tree *);
......@@ -551,6 +669,11 @@ extern bool vectorizable_live_operation (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_induction (tree, block_stmt_iterator *, tree *);
extern int vect_estimate_min_profitable_iters (loop_vec_info);
extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
slp_tree);
extern void vect_model_store_cost (stmt_vec_info, int, enum vect_def_type,
slp_tree);
extern void vect_model_load_cost (stmt_vec_info, int, slp_tree);
/* Driver for transformation stage. */
extern void vect_transform_loop (loop_vec_info);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment