Commit 018b2744 by Richard Sandiford Committed by Richard Sandiford

Handle more SLP constant and extern definitions for variable VF

This patch adds support for vectorising SLP definitions that are
constant or external (i.e. from outside the loop) when the vectorisation
factor isn't known at compile time.  It can only handle cases where the
number of SLP statements is a power of 2.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-slp.c: Include gimple-fold.h and internal-fn.h
	(can_duplicate_and_interleave_p): New function.
	(vect_get_and_check_slp_defs): Take the vector of statements
	rather than just the current one.  Remove excess parentheses.
	Restriction rejectinon of vect_constant_def and vect_external_def
	for variable-length vectors to boolean types, or types for which
	can_duplicate_and_interleave_p is false.
	(vect_build_slp_tree_2): Update call to vect_get_and_check_slp_defs.
	(duplicate_and_interleave): New function.
	(vect_get_constant_vectors): Use gimple_build_vector for
	constant-length vectors and suitable variable-length constant
	vectors.  Use duplicate_and_interleave for other variable-length
	vectors.  Don't defer the update when inserting new statements.

gcc/testsuite/
	* gcc.dg/vect/no-scevccp-slp-30.c: Don't XFAIL for vect_variable_length
	&& vect_load_lanes
	* gcc.dg/vect/slp-1.c: Likewise.
	* gcc.dg/vect/slp-10.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-12c.c: Likewise.
	* gcc.dg/vect/slp-17.c: Likewise.
	* gcc.dg/vect/slp-19b.c: Likewise.
	* gcc.dg/vect/slp-20.c: Likewise.
	* gcc.dg/vect/slp-21.c: Likewise.
	* gcc.dg/vect/slp-22.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/slp-24-big-array.c: Likewise.
	* gcc.dg/vect/slp-24.c: Likewise.
	* gcc.dg/vect/slp-28.c: Likewise.
	* gcc.dg/vect/slp-39.c: Likewise.
	* gcc.dg/vect/slp-6.c: Likewise.
	* gcc.dg/vect/slp-7.c: Likewise.
	* gcc.dg/vect/slp-cond-1.c: Likewise.
	* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
	* gcc.dg/vect/slp-cond-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-1.c: Likewise.
	* gcc.dg/vect/slp-multitypes-8.c: Likewise.
	* gcc.dg/vect/slp-multitypes-9.c: Likewise.
	* gcc.dg/vect/slp-multitypes-10.c: Likewise.
	* gcc.dg/vect/slp-multitypes-12.c: Likewise.
	* gcc.dg/vect/slp-perm-6.c: Likewise.
	* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
	* gcc.dg/vect/vect-live-slp-1.c: Likewise.
	* gcc.dg/vect/vect-live-slp-2.c: Likewise.
	* gcc.dg/vect/pr33953.c: Don't XFAIL for vect_variable_length.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-14.c: Likewise.
	* gcc.dg/vect/slp-15.c: Likewise.
	* gcc.dg/vect/slp-multitypes-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-4.c: Likewise.
	* gcc.dg/vect/slp-multitypes-5.c: Likewise.
	* gcc.target/aarch64/sve/slp_1.c: New test.
	* gcc.target/aarch64/sve/slp_1_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_2.c: Likewise.
	* gcc.target/aarch64/sve/slp_2_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_3.c: Likewise.
	* gcc.target/aarch64/sve/slp_3_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_4.c: Likewise.
	* gcc.target/aarch64/sve/slp_4_run.c: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256622
parent 3ea518f6
...@@ -2,6 +2,24 @@ ...@@ -2,6 +2,24 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* tree-vect-slp.c: Include gimple-fold.h and internal-fn.h
(can_duplicate_and_interleave_p): New function.
(vect_get_and_check_slp_defs): Take the vector of statements
rather than just the current one. Remove excess parentheses.
Restriction rejectinon of vect_constant_def and vect_external_def
for variable-length vectors to boolean types, or types for which
can_duplicate_and_interleave_p is false.
(vect_build_slp_tree_2): Update call to vect_get_and_check_slp_defs.
(duplicate_and_interleave): New function.
(vect_get_constant_vectors): Use gimple_build_vector for
constant-length vectors and suitable variable-length constant
vectors. Use duplicate_and_interleave for other variable-length
vectors. Don't defer the update when inserting new statements.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Make sure * tree-vect-loop.c (vect_estimate_min_profitable_iters): Make sure
min_profitable_iters doesn't go negative. min_profitable_iters doesn't go negative.
......
...@@ -2,6 +2,56 @@ ...@@ -2,6 +2,56 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* gcc.dg/vect/no-scevccp-slp-30.c: Don't XFAIL for vect_variable_length
&& vect_load_lanes
* gcc.dg/vect/slp-1.c: Likewise.
* gcc.dg/vect/slp-10.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-17.c: Likewise.
* gcc.dg/vect/slp-19b.c: Likewise.
* gcc.dg/vect/slp-20.c: Likewise.
* gcc.dg/vect/slp-21.c: Likewise.
* gcc.dg/vect/slp-22.c: Likewise.
* gcc.dg/vect/slp-23.c: Likewise.
* gcc.dg/vect/slp-24-big-array.c: Likewise.
* gcc.dg/vect/slp-24.c: Likewise.
* gcc.dg/vect/slp-28.c: Likewise.
* gcc.dg/vect/slp-39.c: Likewise.
* gcc.dg/vect/slp-6.c: Likewise.
* gcc.dg/vect/slp-7.c: Likewise.
* gcc.dg/vect/slp-cond-1.c: Likewise.
* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
* gcc.dg/vect/slp-cond-2.c: Likewise.
* gcc.dg/vect/slp-multitypes-1.c: Likewise.
* gcc.dg/vect/slp-multitypes-8.c: Likewise.
* gcc.dg/vect/slp-multitypes-9.c: Likewise.
* gcc.dg/vect/slp-multitypes-10.c: Likewise.
* gcc.dg/vect/slp-multitypes-12.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
* gcc.dg/vect/vect-live-slp-1.c: Likewise.
* gcc.dg/vect/vect-live-slp-2.c: Likewise.
* gcc.dg/vect/pr33953.c: Don't XFAIL for vect_variable_length.
* gcc.dg/vect/slp-12a.c: Likewise.
* gcc.dg/vect/slp-14.c: Likewise.
* gcc.dg/vect/slp-15.c: Likewise.
* gcc.dg/vect/slp-multitypes-2.c: Likewise.
* gcc.dg/vect/slp-multitypes-4.c: Likewise.
* gcc.dg/vect/slp-multitypes-5.c: Likewise.
* gcc.target/aarch64/sve/slp_1.c: New test.
* gcc.target/aarch64/sve/slp_1_run.c: Likewise.
* gcc.target/aarch64/sve/slp_2.c: Likewise.
* gcc.target/aarch64/sve/slp_2_run.c: Likewise.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_3_run.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
* gcc.target/aarch64/sve/slp_4_run.c: Likewise.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* gcc.dg/vect/vect-ooo-group-1.c: New test. * gcc.dg/vect/vect-ooo-group-1.c: New test.
* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise.
......
...@@ -52,5 +52,5 @@ int main (void) ...@@ -52,5 +52,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
...@@ -29,6 +29,6 @@ void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, int srcwidth, ...@@ -29,6 +29,6 @@ void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, int srcwidth,
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || vect_variable_length } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
...@@ -118,5 +118,5 @@ int main (void) ...@@ -118,5 +118,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
...@@ -107,7 +107,7 @@ int main (void) ...@@ -107,7 +107,7 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target {vect_uintfloat_cvt && vect_int_mult} } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target {vect_uintfloat_cvt && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */ /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target {{! { vect_uintfloat_cvt}} && { ! {vect_int_mult}}} } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target {{! { vect_uintfloat_cvt}} && { ! {vect_int_mult}}} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_uintfloat_cvt && vect_int_mult } xfail { vect_variable_length && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_uintfloat_cvt && vect_int_mult }} } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target {{! { vect_uintfloat_cvt}} && { ! {vect_int_mult}}} } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target {{! { vect_uintfloat_cvt}} && { ! {vect_int_mult}}} } } } */
...@@ -75,5 +75,5 @@ int main (void) ...@@ -75,5 +75,5 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
...@@ -46,6 +46,6 @@ int main (void) ...@@ -46,6 +46,6 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided2 && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided2 && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided2 && vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided2 && vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided2 && vect_int_mult } xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided2 && vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided2 && vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided2 && vect_int_mult } } } } } */
...@@ -48,5 +48,5 @@ int main (void) ...@@ -48,5 +48,5 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_int_mult } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int_mult xfail { vect_variable_length && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_int_mult } } } } */
...@@ -111,5 +111,5 @@ int main (void) ...@@ -111,5 +111,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_int_mult } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult } } } */
...@@ -112,6 +112,6 @@ int main (void) ...@@ -112,6 +112,6 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target vect_int_mult } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { ! { vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { ! { vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target vect_int_mult xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target { ! { vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target { ! { vect_int_mult } } } } } */
...@@ -51,5 +51,5 @@ int main (void) ...@@ -51,5 +51,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
...@@ -53,5 +53,5 @@ int main (void) ...@@ -53,5 +53,5 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */
/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided4 } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided4 } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided4 } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided4 } } } } */
...@@ -110,5 +110,5 @@ int main (void) ...@@ -110,5 +110,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
...@@ -201,6 +201,6 @@ int main (void) ...@@ -201,6 +201,6 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target { vect_strided4 || vect_extract_even_odd } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target { vect_strided4 || vect_extract_even_odd } } } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided4 || vect_extract_even_odd } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided4 || vect_extract_even_odd } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided4 } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided4 } } } } } */
...@@ -129,5 +129,5 @@ int main (void) ...@@ -129,5 +129,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" } } */
...@@ -109,6 +109,6 @@ int main (void) ...@@ -109,6 +109,6 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided8 || vect_no_align } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided8 || vect_no_align } } } } } */
/* We fail to vectorize the second loop with variable-length SVE but /* We fail to vectorize the second loop with variable-length SVE but
fall back to 128-bit vectors, which does use SLP. */ fall back to 128-bit vectors, which does use SLP. */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_perm } xfail aarch64_sve } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_perm } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_perm } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_perm } } } */
...@@ -91,4 +91,4 @@ int main (void) ...@@ -91,4 +91,4 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { { vect_no_align && ilp32 } || vect_variable_length } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_no_align && ilp32 } } } } */
...@@ -77,4 +77,4 @@ int main (void) ...@@ -77,4 +77,4 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { { vect_no_align && ilp32 } || vect_variable_length } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_no_align && ilp32 } } } } */
...@@ -89,5 +89,5 @@ int main (void) ...@@ -89,5 +89,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
...@@ -21,4 +21,4 @@ void bar (double w) ...@@ -21,4 +21,4 @@ void bar (double w)
} }
} }
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
...@@ -116,6 +116,6 @@ int main (void) ...@@ -116,6 +116,6 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target vect_int_mult} } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" {target vect_int_mult} } } */
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target { ! { vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {target { ! { vect_int_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target vect_int_mult xfail { vect_variable_length && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target vect_int_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { ! { vect_int_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { ! { vect_int_mult } } } } } */
...@@ -122,6 +122,6 @@ int main (void) ...@@ -122,6 +122,6 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target vect_short_mult } } }*/ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target vect_short_mult } } }*/
/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { ! { vect_short_mult } } } } }*/ /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { ! { vect_short_mult } } } } }*/
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_short_mult xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_short_mult } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! { vect_short_mult } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! { vect_short_mult } } } } } */
...@@ -122,4 +122,4 @@ main () ...@@ -122,4 +122,4 @@ main ()
return 0; return 0;
} }
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
...@@ -125,4 +125,4 @@ main () ...@@ -125,4 +125,4 @@ main ()
return 0; return 0;
} }
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
...@@ -125,4 +125,4 @@ main () ...@@ -125,4 +125,4 @@ main ()
return 0; return 0;
} }
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
...@@ -52,5 +52,5 @@ int main (void) ...@@ -52,5 +52,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
...@@ -46,5 +46,5 @@ int main (void) ...@@ -46,5 +46,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc } } } */
...@@ -62,5 +62,5 @@ int main (void) ...@@ -62,5 +62,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
...@@ -77,5 +77,5 @@ int main (void) ...@@ -77,5 +77,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
...@@ -52,5 +52,5 @@ int main (void) ...@@ -52,5 +52,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack } } } */
...@@ -52,5 +52,5 @@ int main (void) ...@@ -52,5 +52,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc } } } */
...@@ -40,5 +40,5 @@ int main (void) ...@@ -40,5 +40,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack } } } */
...@@ -40,5 +40,5 @@ int main (void) ...@@ -40,5 +40,5 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc } } } */
...@@ -104,7 +104,7 @@ int main (int argc, const char* argv[]) ...@@ -104,7 +104,7 @@ int main (int argc, const char* argv[])
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && { ! vect_load_lanes } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && { ! vect_load_lanes } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_load_lanes xfail { vect_variable_length && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_load_lanes } } } */
/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } */
/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */
/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */
...@@ -46,7 +46,7 @@ int main (void) ...@@ -46,7 +46,7 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_widen_mult_hi_to_si xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_widen_mult_hi_to_si } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
/* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */ /* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
...@@ -68,5 +68,5 @@ main (void) ...@@ -68,5 +68,5 @@ main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" } } */ /* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" } } */
...@@ -62,5 +62,5 @@ main (void) ...@@ -62,5 +62,5 @@ main (void)
} }
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 2 "vect" } } */ /* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 2 "vect" } } */
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
#include <stdint.h>
#define VEC_PERM(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
vec_slp_##TYPE (TYPE *restrict a, TYPE b, TYPE c, int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
a[i * 2] += b; \
a[i * 2 + 1] += c; \
} \
}
#define TEST_ALL(T) \
T (int8_t) \
T (uint8_t) \
T (int16_t) \
T (uint16_t) \
T (int32_t) \
T (uint32_t) \
T (int64_t) \
T (uint64_t) \
T (_Float16) \
T (float) \
T (double)
TEST_ALL (VEC_PERM)
/* We should use one DUP for each of the 8-, 16- and 32-bit types,
although we currently use LD1RW for _Float16. We should use two
DUPs for each of the three 64-bit types. */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, [hw]} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, [sw]} 2 } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 1 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, [dx]} 9 } } */
/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-not {\tzip2\t} } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "slp_1.c"
#define N (103 * 2)
#define HARNESS(TYPE) \
{ \
TYPE a[N], b[2] = { 3, 11 }; \
for (unsigned int i = 0; i < N; ++i) \
{ \
a[i] = i * 2 + i % 5; \
asm volatile ("" ::: "memory"); \
} \
vec_slp_##TYPE (a, b[0], b[1], N / 2); \
for (unsigned int i = 0; i < N; ++i) \
{ \
TYPE orig = i * 2 + i % 5; \
TYPE expected = orig + b[i % 2]; \
if (a[i] != expected) \
__builtin_abort (); \
} \
}
int __attribute__ ((optimize (1)))
main (void)
{
TEST_ALL (HARNESS)
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
#include <stdint.h>
#define VEC_PERM(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
vec_slp_##TYPE (TYPE *restrict a, int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
a[i * 2] += 10; \
a[i * 2 + 1] += 17; \
} \
}
#define TEST_ALL(T) \
T (int8_t) \
T (uint8_t) \
T (int16_t) \
T (uint16_t) \
T (int32_t) \
T (uint32_t) \
T (int64_t) \
T (uint64_t) \
T (_Float16) \
T (float) \
T (double)
TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 3 } } */
/* { dg-final { scan-assembler-not {\tzip1\t} } } */
/* { dg-final { scan-assembler-not {\tzip2\t} } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "slp_2.c"
#define N (103 * 2)
#define HARNESS(TYPE) \
{ \
TYPE a[N], b[2] = { 10, 17 }; \
for (unsigned int i = 0; i < N; ++i) \
{ \
a[i] = i * 2 + i % 5; \
asm volatile ("" ::: "memory"); \
} \
vec_slp_##TYPE (a, N / 2); \
for (unsigned int i = 0; i < N; ++i) \
{ \
TYPE orig = i * 2 + i % 5; \
TYPE expected = orig + b[i % 2]; \
if (a[i] != expected) \
__builtin_abort (); \
} \
}
int __attribute__ ((optimize (1)))
main (void)
{
TEST_ALL (HARNESS)
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
#include <stdint.h>
#define VEC_PERM(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
vec_slp_##TYPE (TYPE *restrict a, int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
a[i * 4] += 41; \
a[i * 4 + 1] += 25; \
a[i * 4 + 2] += 31; \
a[i * 4 + 3] += 62; \
} \
}
#define TEST_ALL(T) \
T (int8_t) \
T (uint8_t) \
T (int16_t) \
T (uint16_t) \
T (int32_t) \
T (uint32_t) \
T (int64_t) \
T (uint64_t) \
T (_Float16) \
T (float) \
T (double)
TEST_ALL (VEC_PERM)
/* 1 for each 8-bit type. */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */
/* 1 for each 16-bit type and 4 for double. */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */
/* 1 for each 32-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 3 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #25\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #31\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #62\n} 2 } } */
/* The 64-bit types need:
ZIP1 ZIP1 (2 ZIP2s optimized away)
ZIP1 ZIP2. */
/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 9 } } */
/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "slp_3.c"
#define N (77 * 4)
#define HARNESS(TYPE) \
{ \
TYPE a[N], b[4] = { 41, 25, 31, 62 }; \
for (unsigned int i = 0; i < N; ++i) \
{ \
a[i] = i * 2 + i % 5; \
asm volatile ("" ::: "memory"); \
} \
vec_slp_##TYPE (a, N / 4); \
for (unsigned int i = 0; i < N; ++i) \
{ \
TYPE orig = i * 2 + i % 5; \
TYPE expected = orig + b[i % 4]; \
if (a[i] != expected) \
__builtin_abort (); \
} \
}
int __attribute__ ((optimize (1)))
main (void)
{
TEST_ALL (HARNESS)
}
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
#include <stdint.h>
#define VEC_PERM(TYPE) \
TYPE __attribute__ ((noinline, noclone)) \
vec_slp_##TYPE (TYPE *restrict a, int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
a[i * 8] += 99; \
a[i * 8 + 1] += 11; \
a[i * 8 + 2] += 17; \
a[i * 8 + 3] += 80; \
a[i * 8 + 4] += 63; \
a[i * 8 + 5] += 37; \
a[i * 8 + 6] += 24; \
a[i * 8 + 7] += 81; \
} \
}
#define TEST_ALL(T) \
T (int8_t) \
T (uint8_t) \
T (int16_t) \
T (uint16_t) \
T (int32_t) \
T (uint32_t) \
T (int64_t) \
T (uint64_t) \
T (_Float16) \
T (float) \
T (double)
TEST_ALL (VEC_PERM)
/* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double. */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 } } */
/* 1 for each 16-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]\.b, } 3 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #11\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #17\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #80\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #63\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #37\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #24\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #81\n} 2 } } */
/* The 32-bit types need:
ZIP1 ZIP1 (2 ZIP2s optimized away)
ZIP1 ZIP2
and the 64-bit types need:
ZIP1 ZIP1 ZIP1 ZIP1 (4 ZIP2s optimized away)
ZIP1 ZIP2 ZIP1 ZIP2
ZIP1 ZIP2 ZIP1 ZIP2. */
/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 33 } } */
/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 15 } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "slp_4.c"
#define N (59 * 8)
#define HARNESS(TYPE) \
{ \
TYPE a[N], b[8] = { 99, 11, 17, 80, 63, 37, 24, 81 }; \
for (unsigned int i = 0; i < N; ++i) \
{ \
a[i] = i * 2 + i % 5; \
asm volatile ("" ::: "memory"); \
} \
vec_slp_##TYPE (a, N / 8); \
for (unsigned int i = 0; i < N; ++i) \
{ \
TYPE orig = i * 2 + i % 5; \
TYPE expected = orig + b[i % 8]; \
if (a[i] != expected) \
__builtin_abort (); \
} \
}
int __attribute__ ((optimize (1)))
main (void)
{
TEST_ALL (HARNESS)
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment