Commit a3a821c9 by Kelvin Nilsen

rs6000-p8swap.c (rs6000_sum_of_two_registers_p): New function.

gcc/ChangeLog:

2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p):
	New function.
	(rs6000_quadword_masked_address_p): Likewise.
	(quad_aligned_load_p): Likewise.
	(quad_aligned_store_p): Likewise.
	(const_load_sequence_p): Add comment to describe the outer-most loop.
	(mimic_memory_attributes_and_flags): New function.
	(rs6000_gen_stvx): Likewise.
	(replace_swapped_aligned_store): Likewise.
	(rs6000_gen_lvx): Likewise.
	(replace_swapped_aligned_load): Likewise.
	(replace_swapped_load_constant): Capitalize argument name in
	comment describing this function.
	(rs6000_analyze_swaps): Add a third pass to search for vector loads
	and stores that access quad-word aligned addresses and replace
	with stvx or lvx instructions when appropriate.
	* config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p):
	New function prototype.
	(rs6000_quadword_masked_address_p): Likewise.
	(rs6000_gen_lvx): Likewise.
	(rs6000_gen_stvx): Likewise.
	* config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): For modes
	VSX_D (V2DF, V2DI), modify this split to select lvx instruction
	when memory address is aligned.
	(*vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify
	this split to select lvx instruction when memory address is aligned.
	(*vsx_le_perm_load_v8hi): Modify this split to select lvx
	instruction when memory address is aligned.
	(*vsx_le_perm_load_v16qi): Likewise.
	(four unnamed splitters): Modify to select the stvx instruction
	when memory is aligned.

gcc/testsuite/ChangeLog:

2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look
	for lvx and stvx instead of lxvd2x and stxvd2x and require
	little-endian target.  Add comments.
	* gcc.target/powerpc/swaps-p8-28.c: Add functions for more
	comprehensive testing.
	* gcc.target/powerpc/swaps-p8-29.c: Likewise.
	* gcc.target/powerpc/swaps-p8-30.c: Likewise.
	* gcc.target/powerpc/swaps-p8-31.c: Likewise.
	* gcc.target/powerpc/swaps-p8-32.c: Likewise.
	* gcc.target/powerpc/swaps-p8-33.c: Likewise.
	* gcc.target/powerpc/swaps-p8-34.c: Likewise.
	* gcc.target/powerpc/swaps-p8-35.c: Likewise.
	* gcc.target/powerpc/swaps-p8-36.c: Likewise.
	* gcc.target/powerpc/swaps-p8-37.c: Likewise.
	* gcc.target/powerpc/swaps-p8-38.c: Likewise.
	* gcc.target/powerpc/swaps-p8-39.c: Likewise.
	* gcc.target/powerpc/swaps-p8-40.c: Likewise.
	* gcc.target/powerpc/swaps-p8-41.c: Likewise.
	* gcc.target/powerpc/swaps-p8-42.c: Likewise.
	* gcc.target/powerpc/swaps-p8-43.c: Likewise.
	* gcc.target/powerpc/swaps-p8-44.c: Likewise.
	* gcc.target/powerpc/swaps-p8-45.c: Likewise.
	* gcc.target/powerpc/vec-extract-2.c: Add comment and remove
	scan-assembler-not directives that forbid lvx and xxpermdi.
	* gcc.target/powerpc/vec-extract-3.c: Likewise.
	* gcc.target/powerpc/vec-extract-5.c: Likewise.
	* gcc.target/powerpc/vec-extract-6.c: Likewise.
	* gcc.target/powerpc/vec-extract-7.c: Likewise.
	* gcc.target/powerpc/vec-extract-8.c: Likewise.
	* gcc.target/powerpc/vec-extract-9.c: Likewise.
	* gcc.target/powerpc/vsx-vector-6-le.c: Change
	scan-assembler-times directives to reflect different numbers of
	expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions.

libcpp/ChangeLog:

2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* lex.c (search_line_fast): Remove illegal coercion of an
	unaligned pointer value to vector pointer type and replace with
	use of __builtin_vec_vsx_ld () built-in function, which operates
	on unaligned pointer values.

From-SVN: r256656
parent ffad1c54
2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
* config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p):
New function.
(rs6000_quadword_masked_address_p): Likewise.
(quad_aligned_load_p): Likewise.
(quad_aligned_store_p): Likewise.
(const_load_sequence_p): Add comment to describe the outer-most loop.
(mimic_memory_attributes_and_flags): New function.
(rs6000_gen_stvx): Likewise.
(replace_swapped_aligned_store): Likewise.
(rs6000_gen_lvx): Likewise.
(replace_swapped_aligned_load): Likewise.
(replace_swapped_load_constant): Capitalize argument name in
comment describing this function.
(rs6000_analyze_swaps): Add a third pass to search for vector loads
and stores that access quad-word aligned addresses and replace
with stvx or lvx instructions when appropriate.
* config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p):
New function prototype.
(rs6000_quadword_masked_address_p): Likewise.
(rs6000_gen_lvx): Likewise.
(rs6000_gen_stvx): Likewise.
* config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): For modes
VSX_D (V2DF, V2DI), modify this split to select lvx instruction
when memory address is aligned.
(*vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify
this split to select lvx instruction when memory address is aligned.
(*vsx_le_perm_load_v8hi): Modify this split to select lvx
instruction when memory address is aligned.
(*vsx_le_perm_load_v16qi): Likewise.
(four unnamed splitters): Modify to select the stvx instruction
when memory is aligned.
2018-01-13 Jan Hubicka <hubicka@ucw.cz> 2018-01-13 Jan Hubicka <hubicka@ucw.cz>
* predict.c (determine_unlikely_bbs): Handle correctly BBs * predict.c (determine_unlikely_bbs): Handle correctly BBs
......
...@@ -254,5 +254,9 @@ namespace gcc { class context; } ...@@ -254,5 +254,9 @@ namespace gcc { class context; }
class rtl_opt_pass; class rtl_opt_pass;
extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *); extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
extern bool rs6000_quadword_masked_address_p (const_rtx exp);
extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
extern rtx rs6000_gen_stvx (enum machine_mode, rtx, rtx);
#endif /* rs6000-protos.h */ #endif /* rs6000-protos.h */
2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
* gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look
for lvx and stvx instead of lxvd2x and stxvd2x and require
little-endian target. Add comments.
* gcc.target/powerpc/swaps-p8-28.c: Add functions for more
comprehensive testing.
* gcc.target/powerpc/swaps-p8-29.c: Likewise.
* gcc.target/powerpc/swaps-p8-30.c: Likewise.
* gcc.target/powerpc/swaps-p8-31.c: Likewise.
* gcc.target/powerpc/swaps-p8-32.c: Likewise.
* gcc.target/powerpc/swaps-p8-33.c: Likewise.
* gcc.target/powerpc/swaps-p8-34.c: Likewise.
* gcc.target/powerpc/swaps-p8-35.c: Likewise.
* gcc.target/powerpc/swaps-p8-36.c: Likewise.
* gcc.target/powerpc/swaps-p8-37.c: Likewise.
* gcc.target/powerpc/swaps-p8-38.c: Likewise.
* gcc.target/powerpc/swaps-p8-39.c: Likewise.
* gcc.target/powerpc/swaps-p8-40.c: Likewise.
* gcc.target/powerpc/swaps-p8-41.c: Likewise.
* gcc.target/powerpc/swaps-p8-42.c: Likewise.
* gcc.target/powerpc/swaps-p8-43.c: Likewise.
* gcc.target/powerpc/swaps-p8-44.c: Likewise.
* gcc.target/powerpc/swaps-p8-45.c: Likewise.
* gcc.target/powerpc/vec-extract-2.c: Add comment and remove
scan-assembler-not directives that forbid lvx and xxpermdi.
* gcc.target/powerpc/vec-extract-3.c: Likewise.
* gcc.target/powerpc/vec-extract-5.c: Likewise.
* gcc.target/powerpc/vec-extract-6.c: Likewise.
* gcc.target/powerpc/vec-extract-7.c: Likewise.
* gcc.target/powerpc/vec-extract-8.c: Likewise.
* gcc.target/powerpc/vec-extract-9.c: Likewise.
* gcc.target/powerpc/vsx-vector-6-le.c: Change
scan-assembler-times directives to reflect different numbers of
expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
......
/* { dg-do compile { target { powerpc*-*-* } } } */ /* Expected instruction selection as characterized by
scan-assembler-times directives below is only relevant to
little-endian targets. */
/* { dg-do compile { target { powerpc64le-*-* } } } */
/* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-skip-if "" { powerpc*-*-darwin* } } */
/* { dg-require-effective-target powerpc_vsx_ok } */ /* { dg-require-effective-target powerpc_vsx_ok } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
/* { dg-options "-O2 -mcpu=power7 -mabi=altivec" } */ /* { dg-options "-O2 -mcpu=power7 -mabi=altivec" } */
/* { dg-final { scan-assembler-times "lxvd2x" 1 } } */ /* { dg-final { scan-assembler-times "lvx" 1 } } */
/* { dg-final { scan-assembler-times "stxvd2x" 1 } } */ /* { dg-final { scan-assembler-times "stvx" 1 } } */
/* { dg-final { scan-assembler-not "ld" } } */ /* { dg-final { scan-assembler-not "ld" } } */
/* { dg-final { scan-assembler-not "lwz" } } */ /* { dg-final { scan-assembler-not "lwz" } } */
/* { dg-final { scan-assembler-not "stw" } } */ /* { dg-final { scan-assembler-not "stw" } } */
...@@ -15,12 +18,19 @@ typedef vector long long v2di_type; ...@@ -15,12 +18,19 @@ typedef vector long long v2di_type;
v2di_type v2di_type
return_v2di (v2di_type *ptr) return_v2di (v2di_type *ptr)
{ {
return *ptr; /* should generate lxvd2x 34,0,3. */ /* As of pr48857, should generate lxvd2x 34,0,3
followed by xxpermdi 34,34,34,2. Subsequent optimization
recognizes that ptr refers to an aligned vector and replaces
this with lvx 2,0,3. */
return *ptr;
} }
void void
pass_v2di (v2di_type arg, v2di_type *ptr) pass_v2di (v2di_type arg, v2di_type *ptr)
{ {
*ptr = arg; /* should generate stxvd2x 34,0,{3,5}. */ /* As of pr48857, should generate xxpermdi 34,34,34,2 followed by
stxvd2x 34,0,5. Subsequent optimization recognizes that ptr
refers to an aligned vector and replaces this with stvx 2,0,5. */
*ptr = arg;
} }
...@@ -12,10 +12,100 @@ vector char y = { 0, 1, 2, 3, ...@@ -12,10 +12,100 @@ vector char y = { 0, 1, 2, 3,
8, 9, 10, 11, 8, 9, 10, 11,
12, 13, 14, 15 }; 12, 13, 14, 15 };
vector char x, z;
vector char vector char
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector char
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector char *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector char
foo2 (void)
{
vector char v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector char
foo3 (vector char *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
char a_field;
vector char a_vector;
};
vector char
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector char arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector char *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector char arg)
{
vector char v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector char *arg1, vector char arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector char v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
...@@ -24,6 +114,47 @@ main (int argc, char *argv[]) ...@@ -24,6 +114,47 @@ main (int argc, char *argv[])
vector char fetched_value = foo (); vector char fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[15] != 15) if (fetched_value[0] != 0 || fetched_value[15] != 15)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[14] != 14)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[13] != 13)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[12] != 12)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[4] != 4 || fetched_value[11] != 11)
abort ();
for (int i = 0; i < 16; i++)
z[i] = 15 - i;
baz (z);
if (x[0] != 15 || x[15] != 0)
abort ();
vector char source = { 8, 7, 6, 5, 4, 3, 2, 1,
0, 9, 10, 11, 12, 13, 14, 15 };
baz1 (source);
if (x[3] != 5 || x[8] != 0)
abort ();
vector char dest;
baz2 (&dest, source);
if (dest[4] != 4 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[7] != 1 || a_struct.a_vector[15] != 15)
abort ();
return 0;
} }
...@@ -12,10 +12,100 @@ const vector char y = { 0, 1, 2, 3, ...@@ -12,10 +12,100 @@ const vector char y = { 0, 1, 2, 3,
8, 9, 10, 11, 8, 9, 10, 11,
12, 13, 14, 15 }; 12, 13, 14, 15 };
vector char x, z;
vector char vector char
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector char
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector char *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector char
foo2 (void)
{
vector char v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector char
foo3 (vector char *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
char a_field;
vector char a_vector;
};
vector char
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector char arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector char *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector char arg)
{
vector char v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector char *arg1, vector char arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector char v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
...@@ -24,6 +114,47 @@ main (int argc, char *argv[]) ...@@ -24,6 +114,47 @@ main (int argc, char *argv[])
vector char fetched_value = foo (); vector char fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[15] != 15) if (fetched_value[0] != 0 || fetched_value[15] != 15)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[14] != 14)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[13] != 13)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[12] != 12)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[4] != 4 || fetched_value[11] != 11)
abort ();
for (int i = 0; i < 16; i++)
z[i] = 15 - i;
baz (z);
if (x[0] != 15 || x[15] != 0)
abort ();
vector char source = { 8, 7, 6, 5, 4, 3, 2, 1,
0, 9, 10, 11, 12, 13, 14, 15 };
baz1 (source);
if (x[3] != 5 || x[8] != 0)
abort ();
vector char dest;
baz2 (&dest, source);
if (dest[4] != 4 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[7] != 1 || a_struct.a_vector[15] != 15)
abort ();
return 0;
} }
...@@ -2,8 +2,12 @@ ...@@ -2,8 +2,12 @@
/* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-require-effective-target powerpc_p8vector_ok } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O3 " } */ /* { dg-options "-mcpu=power8 -O3 " } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */
/* { dg-final { scan-assembler-not "xxswapd" } } */ /* Previous versions of this test required that the assembler does not
contain xxpermdi or xxswapd. However, with the more sophisticated
code generation used today, it is now possible that xxpermdi (aka
xxswapd) show up without being part of a lxvd2x or stxvd2x
sequence. */
#include <altivec.h> #include <altivec.h>
...@@ -14,10 +18,100 @@ const vector char y = { 0, 1, 2, 3, ...@@ -14,10 +18,100 @@ const vector char y = { 0, 1, 2, 3,
8, 9, 10, 11, 8, 9, 10, 11,
12, 13, 14, 15 }; 12, 13, 14, 15 };
vector char x, z;
vector char vector char
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector char
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector char *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector char
foo2 (void)
{
vector char v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector char
foo3 (vector char *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
char a_field;
vector char a_vector;
};
vector char
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector char arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector char *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector char arg)
{
vector char v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector char *arg1, vector char arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector char v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
...@@ -26,6 +120,47 @@ main (int argc, char *argv[]) ...@@ -26,6 +120,47 @@ main (int argc, char *argv[])
vector char fetched_value = foo (); vector char fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[15] != 15) if (fetched_value[0] != 0 || fetched_value[15] != 15)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[14] != 14)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[13] != 13)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[12] != 12)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[4] != 4 || fetched_value[11] != 11)
abort ();
for (int i = 0; i < 16; i++)
z[i] = 15 - i;
baz (z);
if (x[0] != 15 || x[15] != 0)
abort ();
vector char source = { 8, 7, 6, 5, 4, 3, 2, 1,
0, 9, 10, 11, 12, 13, 14, 15 };
baz1 (source);
if (x[3] != 5 || x[8] != 0)
abort ();
vector char dest;
baz2 (&dest, source);
if (dest[4] != 4 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[7] != 1 || a_struct.a_vector[15] != 15)
abort ();
return 0;
} }
...@@ -7,21 +7,150 @@ ...@@ -7,21 +7,150 @@
extern void abort (void); extern void abort (void);
vector short y = { 0, 1, 2, 3, vector short x;
4, 5, 6, 7 }; vector short y = { 0, 1, 2, 3, 4, 5, 6, 7 };
vector short z;
vector short vector short
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector short
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector short *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector short
foo2 (void)
{
vector short v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector short
foo3 (vector short *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector short a_vector;
};
vector short
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector short arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector short *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector short arg)
{
vector short v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector short *arg1, vector short arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector short v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (int argc, short *argv[])
{ {
vector short fetched_value = foo (); vector short fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[7] != 7) if (fetched_value[0] != 0 || fetched_value[7] != 7)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[6] != 6)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[5] != 5)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[4] != 4)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[4] != 4 || fetched_value[3] != 3)
abort ();
for (int i = 0; i < 8; i++)
z[i] = 7 - i;
baz (z);
if (x[0] != 7 || x[7] != 0)
abort ();
vector short source = { 8, 7, 6, 5, 4, 3, 2, 1 };
baz1 (source);
if (x[3] != 5 || x[7] != 1)
abort ();
vector short dest;
baz2 (&dest, source);
if (dest[4] != 4 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[7] != 1 || a_struct.a_vector[5] != 3)
abort ();
return 0;
} }
...@@ -7,21 +7,150 @@ ...@@ -7,21 +7,150 @@
extern void abort (void); extern void abort (void);
const vector short y = { 0, 1, 2, 3, vector short x;
4, 5, 6, 7 }; const vector short y = { 0, 1, 2, 3, 4, 5, 6, 7 };
vector short z;
vector short vector short
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector short
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector short *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector short
foo2 (void)
{
vector short v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector short
foo3 (vector short *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector short a_vector;
};
vector short
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector short arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector short *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector short arg)
{
vector short v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector short *arg1, vector short arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector short v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (int argc, short *argv[])
{ {
vector short fetched_value = foo (); vector short fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[7] != 7) if (fetched_value[0] != 0 || fetched_value[7] != 7)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[6] != 6)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[5] != 5)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[4] != 4)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[4] != 4 || fetched_value[3] != 3)
abort ();
for (int i = 0; i < 8; i++)
z[i] = 7 - i;
baz (z);
if (x[0] != 7 || x[7] != 0)
abort ();
vector short source = { 8, 7, 6, 5, 4, 3, 2, 1 };
baz1 (source);
if (x[3] != 5 || x[7] != 1)
abort ();
vector short dest;
baz2 (&dest, source);
if (dest[4] != 4 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[7] != 1 || a_struct.a_vector[5] != 3)
abort ();
return 0;
} }
...@@ -2,28 +2,161 @@ ...@@ -2,28 +2,161 @@
/* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-require-effective-target powerpc_p8vector_ok } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O3 " } */ /* { dg-options "-mcpu=power8 -O3 " } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */
/* { dg-final { scan-assembler-not "xxswapd" } } */ /* Previous versions of this test required that the assembler does not
contain xxpermdi or xxswapd. However, with the more sophisticated
code generation used today, it is now possible that xxpermdi (aka
xxswapd) show up without being part of a lxvd2x or stxvd2x
sequence. */
#include <altivec.h> #include <altivec.h>
extern void abort (void); extern void abort (void);
const vector short y = { 0, 1, 2, 3, vector short x;
4, 5, 6, 7 }; const vector short y = { 0, 1, 2, 3, 4, 5, 6, 7 };
vector short z;
vector short vector short
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector short
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector short *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector short
foo2 (void)
{
vector short v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector short
foo3 (vector short *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector short a_vector;
};
vector short
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector short arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector short *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector short arg)
{
vector short v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector short *arg1, vector short arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector short v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (int argc, short *argv[])
{ {
vector short fetched_value = foo (); vector short fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[15] != 15) if (fetched_value[0] != 0 || fetched_value[7] != 7)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[6] != 6)
abort (); abort ();
else
return 0; fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[5] != 5)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[4] != 4)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[4] != 4 || fetched_value[3] != 3)
abort ();
for (int i = 0; i < 8; i++)
z[i] = 7 - i;
baz (z);
if (x[0] != 7 || x[7] != 0)
abort ();
vector short source = { 8, 7, 6, 5, 4, 3, 2, 1 };
baz1 (source);
if (x[3] != 5 || x[7] != 1)
abort ();
vector short dest;
baz2 (&dest, source);
if (dest[4] != 4 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[7] != 1 || a_struct.a_vector[5] != 3)
abort ();
return 0;
} }
...@@ -7,20 +7,152 @@ ...@@ -7,20 +7,152 @@
extern void abort (void); extern void abort (void);
vector int x;
vector int y = { 0, 1, 2, 3 }; vector int y = { 0, 1, 2, 3 };
vector int z;
vector int vector int
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector int
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector int *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector int
foo2 (void)
{
vector int v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector int
foo3 (vector int *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector int a_vector;
};
vector int
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector int arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector int *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector int arg)
{
vector int v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector int *arg1, vector int arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector int v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (int argc, int *argv[])
{ {
vector int fetched_value = foo (); vector int fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[3] != 3) if (fetched_value[0] != 0 || fetched_value[3] != 3)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[2] != 2)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[1] != 1)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[0] != 0)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[2] != 2 || fetched_value[3] != 3)
abort ();
z[0] = 7;
z[1] = 6;
z[2] = 5;
z[3] = 4;
baz (z);
if (x[0] != 7 || x[3] != 4)
abort ();
vector int source = { 8, 7, 6, 5 };
baz1 (source);
if (x[2] != 6 || x[1] != 7)
abort ();
vector int dest;
baz2 (&dest, source);
if (dest[0] != 8 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[3] != 5 || a_struct.a_vector[0] != 8)
abort ();
return 0;
} }
...@@ -7,20 +7,152 @@ ...@@ -7,20 +7,152 @@
extern void abort (void); extern void abort (void);
vector int x;
const vector int y = { 0, 1, 2, 3 }; const vector int y = { 0, 1, 2, 3 };
vector int z;
vector int vector int
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector int
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector int *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector int
foo2 (void)
{
vector int v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector int
foo3 (vector int *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector int a_vector;
};
vector int
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector int arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector int *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector int arg)
{
vector int v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector int *arg1, vector int arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector int v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (int argc, int *argv[])
{ {
vector int fetched_value = foo (); vector int fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[3] != 3) if (fetched_value[0] != 0 || fetched_value[3] != 3)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[2] != 2)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[1] != 1)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[0] != 0)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[2] != 2 || fetched_value[3] != 3)
abort ();
z[0] = 7;
z[1] = 6;
z[2] = 5;
z[3] = 4;
baz (z);
if (x[0] != 7 || x[3] != 4)
abort ();
vector int source = { 8, 7, 6, 5 };
baz1 (source);
if (x[2] != 6 || x[1] != 7)
abort ();
vector int dest;
baz2 (&dest, source);
if (dest[0] != 8 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[3] != 5 || a_struct.a_vector[0] != 8)
abort ();
return 0;
} }
...@@ -2,27 +2,163 @@ ...@@ -2,27 +2,163 @@
/* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-require-effective-target powerpc_p8vector_ok } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O3 " } */ /* { dg-options "-mcpu=power8 -O3 " } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */
/* { dg-final { scan-assembler-not "xxswapd" } } */ /* Previous versions of this test required that the assembler does not
contain xxpermdi or xxswapd. However, with the more sophisticated
code generation used today, it is now possible that xxpermdi (aka
xxswapd) show up without being part of a lxvd2x or stxvd2x
sequence. */
#include <altivec.h> #include <altivec.h>
extern void abort (void); extern void abort (void);
vector int x;
const vector int y = { 0, 1, 2, 3 }; const vector int y = { 0, 1, 2, 3 };
vector int z;
vector int vector int
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector int
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector int *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector int
foo2 (void)
{
vector int v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector int
foo3 (vector int *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector int a_vector;
};
vector int
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector int arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector int *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector int arg)
{
vector int v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector int *arg1, vector int arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector int v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (int argc, int *argv[])
{ {
vector int fetched_value = foo (); vector int fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[3] != 3) if (fetched_value[0] != 0 || fetched_value[3] != 3)
abort (); abort ();
else
return 0; fetched_value = foo1 ();
if (fetched_value[1] != 1 || fetched_value[2] != 2)
abort ();
fetched_value = foo2 ();
if (fetched_value[2] != 2 || fetched_value[1] != 1)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 3 || fetched_value[0] != 0)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[2] != 2 || fetched_value[3] != 3)
abort ();
z[0] = 7;
z[1] = 6;
z[2] = 5;
z[3] = 4;
baz (z);
if (x[0] != 7 || x[3] != 4)
abort ();
vector int source = { 8, 7, 6, 5 };
baz1 (source);
if (x[3] != 6 || x[2] != 7)
abort ();
vector int dest;
baz2 (&dest, source);
if (dest[0] != 8 || dest[1] != 7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[3] != 5 || a_struct.a_vector[0] != 8)
abort ();
return 0;
} }
...@@ -7,20 +7,152 @@ ...@@ -7,20 +7,152 @@
extern void abort (void); extern void abort (void);
vector float y = { 0.0f, 0.1f, 0.2f, 0.3f }; vector float x;
vector float y = { 0.0F, 0.1F, 0.2F, 0.3F };
vector float z;
vector float vector float
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector float
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector float *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector float
foo2 (void)
{
vector float v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector float
foo3 (vector float *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector float a_vector;
};
vector float
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector float arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector float *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector float arg)
{
vector float v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector float *arg1, vector float arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector float v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (float argc, float *argv[])
{ {
vector float fetched_value = foo (); vector float fetched_value = foo ();
if (fetched_value[0] != 0.0f || fetched_value[3] != 0.3f) if (fetched_value[0] != 0.0F || fetched_value[3] != 0.3F)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 0.1F || fetched_value[2] != 0.2F)
abort (); abort ();
else
return 0; fetched_value = foo2 ();
if (fetched_value[2] != 0.2F || fetched_value[1] != 0.1F)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 0.3F || fetched_value[0] != 0.0F)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[2] != 0.2F || fetched_value[3] != 0.3F)
abort ();
z[0] = 0.7F;
z[1] = 0.6F;
z[2] = 0.5F;
z[3] = 0.4F;
baz (z);
if (x[0] != 0.7F || x[3] != 0.4F)
abort ();
vector float source = { 0.8F, 0.7F, 0.6F, 0.5F };
baz1 (source);
if (x[2] != 0.6F || x[1] != 0.7F)
abort ();
vector float dest;
baz2 (&dest, source);
if (dest[0] != 0.8F || dest[1] != 0.7F)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[3] != 0.5F || a_struct.a_vector[0] != 0.8F)
abort ();
return 0;
} }
...@@ -7,20 +7,152 @@ ...@@ -7,20 +7,152 @@
extern void abort (void); extern void abort (void);
const vector float y = { 0.0f, 0.1f, 0.2f, 0.3f }; vector float x;
const vector float y = { 0.0F, 0.1F, 0.2F, 0.3F };
vector float z;
vector float vector float
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector float
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector float *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector float
foo2 (void)
{
vector float v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector float
foo3 (vector float *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector float a_vector;
};
vector float
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector float arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector float *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector float arg)
{
vector float v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector float *arg1, vector float arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector float v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (float argc, float *argv[])
{ {
vector float fetched_value = foo (); vector float fetched_value = foo ();
if (fetched_value[0] != 0.0f || fetched_value[3] != 0.3f) if (fetched_value[0] != 0.0F || fetched_value[3] != 0.3F)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 0.1F || fetched_value[2] != 0.2F)
abort (); abort ();
else
return 0; fetched_value = foo2 ();
if (fetched_value[2] != 0.2F || fetched_value[1] != 0.1F)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 0.3F || fetched_value[0] != 0.0F)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[2] != 0.2F || fetched_value[3] != 0.3F)
abort ();
z[0] = 0.7F;
z[1] = 0.6F;
z[2] = 0.5F;
z[3] = 0.4F;
baz (z);
if (x[0] != 0.7F || x[3] != 0.4F)
abort ();
vector float source = { 0.8F, 0.7F, 0.6F, 0.5F };
baz1 (source);
if (x[2] != 0.6F || x[1] != 0.7F)
abort ();
vector float dest;
baz2 (&dest, source);
if (dest[0] != 0.8F || dest[1] != 0.7F)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[3] != 0.5F || a_struct.a_vector[0] != 0.8F)
abort ();
return 0;
} }
...@@ -2,27 +2,163 @@ ...@@ -2,27 +2,163 @@
/* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-require-effective-target powerpc_p8vector_ok } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O3 " } */ /* { dg-options "-mcpu=power8 -O3 " } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */
/* { dg-final { scan-assembler-not "xxswapd" } } */ /* Previous versions of this test required that the assembler does not
contain xxpermdi or xxswapd. However, with the more sophisticated
code generation used today, it is now possible that xxpermdi (aka
xxswapd) show up without being part of a lxvd2x or stxvd2x
sequence. */
#include <altivec.h> #include <altivec.h>
extern void abort (void); extern void abort (void);
const vector float y = { 0.0f, 0.1f, 0.2f, 0.3f }; vector float x;
const vector float y = { 0.0F, 0.1F, 0.2F, 0.3F };
vector float z;
vector float vector float
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector float
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector float *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector float
foo2 (void)
{
vector float v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector float
foo3 (vector float *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector float a_vector;
};
vector float
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector float arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector float *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector float arg)
{
vector float v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector float *arg1, vector float arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector float v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (float argc, float *argv[])
{ {
vector float fetched_value = foo (); vector float fetched_value = foo ();
if (fetched_value[0] != 0.0f || fetched_value[3] != 0.3) if (fetched_value[0] != 0.0F || fetched_value[3] != 0.3F)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 0.1F || fetched_value[2] != 0.2F)
abort (); abort ();
else
return 0; fetched_value = foo2 ();
if (fetched_value[2] != 0.2F || fetched_value[1] != 0.1F)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[3] != 0.3F || fetched_value[0] != 0.0F)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[2] != 0.2F || fetched_value[3] != 0.3F)
abort ();
z[0] = 0.7F;
z[1] = 0.6F;
z[2] = 0.5F;
z[3] = 0.4F;
baz (z);
if (x[0] != 0.7F || x[3] != 0.4F)
abort ();
vector float source = { 0.8F, 0.7F, 0.6F, 0.5F };
baz1 (source);
if (x[3] != 0.6F || x[2] != 0.7F)
abort ();
vector float dest;
baz2 (&dest, source);
if (dest[0] != 0.8F || dest[1] != 0.7F)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[3] != 0.5F || a_struct.a_vector[0] != 0.8F)
abort ();
return 0;
} }
...@@ -7,20 +7,150 @@ ...@@ -7,20 +7,150 @@
extern void abort (void); extern void abort (void);
vector long long int y = { 0, 1 }; vector long long x;
vector long long y = { 1024, 2048 };
vector long long z;
vector long long int vector long long
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector long long
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector long long *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector long long
foo2 (void)
{
vector long long v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector long long
foo3 (vector long long *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector long long a_vector;
};
vector long long
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector long long arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector long long *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector long long arg)
{
vector long long v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector long long *arg1, vector long long arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector long long v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, int *argv[]) main (long long argc, long long *argv[])
{ {
vector long long int fetched_value = foo (); vector long long fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[1] != 1) if (fetched_value[0] != 1024 || fetched_value[1] != 2048)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort ();
fetched_value = foo2 ();
if (fetched_value[0] != 1024 || fetched_value[1] != 2048)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort (); abort ();
else
return 0; struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort ();
z[0] = 7096;
z[1] = 6048;
baz (z);
if (x[0] != 7096 || x[1] != 6048)
abort ();
vector long long source = { 8192, 7096};
baz1 (source);
if (x[0] != 8192 || x[1] != 7096)
abort ();
vector long long dest;
baz2 (&dest, source);
if (dest[0] != 8192 || dest[1] != 7096)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[1] != 7096 || a_struct.a_vector[0] != 8192)
abort ();
return 0;
} }
...@@ -7,20 +7,150 @@ ...@@ -7,20 +7,150 @@
extern void abort (void); extern void abort (void);
const vector long long int y = { 0, 1 }; vector long long x;
const vector long long y = { 1024, 2048 };
vector long long z;
vector long long int vector long long
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector long long
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector long long *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector long long
foo2 (void)
{
vector long long v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector long long
foo3 (vector long long *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector long long a_vector;
};
vector long long
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector long long arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector long long *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector long long arg)
{
vector long long v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector long long *arg1, vector long long arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector long long v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (long long argc, long long *argv[])
{ {
vector long long int fetched_value = foo (); vector long long fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[1] != 1) if (fetched_value[0] != 1024 || fetched_value[1] != 2048)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort ();
fetched_value = foo2 ();
if (fetched_value[0] != 1024 || fetched_value[1] != 2048)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort (); abort ();
else
return 0; struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort ();
z[0] = 7096;
z[1] = 6048;
baz (z);
if (x[0] != 7096 || x[1] != 6048)
abort ();
vector long long source = { 8192, 7096};
baz1 (source);
if (x[0] != 8192 || x[1] != 7096)
abort ();
vector long long dest;
baz2 (&dest, source);
if (dest[0] != 8192 || dest[1] != 7096)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[1] != 7096 || a_struct.a_vector[0] != 8192)
abort ();
return 0;
} }
...@@ -2,27 +2,161 @@ ...@@ -2,27 +2,161 @@
/* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-require-effective-target powerpc_p8vector_ok } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O3 " } */ /* { dg-options "-mcpu=power8 -O3 " } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */
/* { dg-final { scan-assembler-not "xxswapd" } } */ /* Previous versions of this test required that the assembler does not
contain xxpermdi or xxswapd. However, with the more sophisticated
code generation used today, it is now possible that xxpermdi (aka
xxswapd) show up without being part of a lxvd2x or stxvd2x
sequence. */
#include <altivec.h> #include <altivec.h>
extern void abort (void); extern void abort (void);
const vector long long int y = { 0, 1 }; vector long long x;
const vector long long y = { 1024, 2048 };
vector long long z;
vector long long int vector long long
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector long long
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector long long *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector long long
foo2 (void)
{
vector long long v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector long long
foo3 (vector long long *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector long long a_vector;
};
vector long long
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector long long arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector long long *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector long long arg)
{
vector long long v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector long long *arg1, vector long long arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector long long v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (long long argc, long long *argv[])
{ {
vector long long int fetched_value = foo (); vector long long fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[1] != 1) if (fetched_value[0] != 1024 || fetched_value[1] != 2048)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort ();
fetched_value = foo2 ();
if (fetched_value[0] != 1024 || fetched_value[1] != 2048)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort (); abort ();
else
return 0; struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[1] != 2048 || fetched_value[0] != 1024)
abort ();
z[0] = 7096;
z[1] = 6048;
baz (z);
if (x[0] != 7096 || x[1] != 6048)
abort ();
vector long long source = { 8192, 7096};
baz1 (source);
if (x[0] != 8192 || x[1] != 7096)
abort ();
vector long long dest;
baz2 (&dest, source);
if (dest[0] != 8192 || dest[1] != 7096)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[1] != 7096 || a_struct.a_vector[0] != 8192)
abort ();
return 0;
} }
...@@ -7,20 +7,150 @@ ...@@ -7,20 +7,150 @@
extern void abort (void); extern void abort (void);
vector double y = { 0.0, 0.1 }; vector double x;
vector double y = { 0.1, 0.2 };
vector double z;
vector double vector double
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector double
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector double *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector double
foo2 (void)
{
vector double v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector double
foo3 (vector double *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector double a_vector;
};
vector double
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector double arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector double *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector double arg)
{
vector double v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector double *arg1, vector double arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector double v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (double argc, double *argv[])
{ {
vector double fetched_value = foo (); vector double fetched_value = foo ();
if (fetched_value[0] != 0 || fetched_value[1] != 0.1) if (fetched_value[0] != 0.1 || fetched_value[1] != 0.2)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort (); abort ();
else
return 0; fetched_value = foo2 ();
if (fetched_value[0] != 0.1 || fetched_value[1] != 0.2)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort ();
z[0] = 0.7;
z[1] = 0.6;
baz (z);
if (x[0] != 0.7 || x[1] != 0.6)
abort ();
vector double source = { 0.8, 0.7 };
baz1 (source);
if (x[0] != 0.8 || x[1] != 0.7)
abort ();
vector double dest;
baz2 (&dest, source);
if (dest[0] != 0.8 || dest[1] != 0.7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[1] != 0.7 || a_struct.a_vector[0] != 0.8)
abort ();
return 0;
} }
...@@ -7,20 +7,150 @@ ...@@ -7,20 +7,150 @@
extern void abort (void); extern void abort (void);
const vector double y = { 0.0, 0.1 }; vector double x;
const vector double y = { 0.1, 0.2 };
vector double z;
vector double vector double
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector double
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector double *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector double
foo2 (void)
{
vector double v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector double
foo3 (vector double *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector double a_vector;
};
vector double
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector double arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector double *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector double arg)
{
vector double v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector double *arg1, vector double arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector double v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (double argc, double *argv[])
{ {
vector double fetched_value = foo (); vector double fetched_value = foo ();
if (fetched_value[0] != 0.0 || fetched_value[1] != 0.1) if (fetched_value[0] != 0.1 || fetched_value[1] != 0.2)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort (); abort ();
else
return 0; fetched_value = foo2 ();
if (fetched_value[0] != 0.1 || fetched_value[1] != 0.2)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort ();
z[0] = 0.7;
z[1] = 0.6;
baz (z);
if (x[0] != 0.7 || x[1] != 0.6)
abort ();
vector double source = { 0.8, 0.7 };
baz1 (source);
if (x[0] != 0.8 || x[1] != 0.7)
abort ();
vector double dest;
baz2 (&dest, source);
if (dest[0] != 0.8 || dest[1] != 0.7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[1] != 0.7 || a_struct.a_vector[0] != 0.8)
abort ();
return 0;
} }
...@@ -2,27 +2,161 @@ ...@@ -2,27 +2,161 @@
/* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-require-effective-target powerpc_p8vector_ok } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O3 " } */ /* { dg-options "-mcpu=power8 -O3 " } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */
/* { dg-final { scan-assembler-not "xxswapd" } } */ /* Previous versions of this test required that the assembler does not
contain xxpermdi or xxswapd. However, with the more sophisticated
code generation used today, it is now possible that xxpermdi (aka
xxswapd) show up without being part of a lxvd2x or stxvd2x
sequence. */
#include <altivec.h> #include <altivec.h>
extern void abort (void); extern void abort (void);
const vector double y = { 0.0, 0.1 }; vector double x;
const vector double y = { 0.1, 0.2 };
vector double z;
vector double vector double
foo (void) foo (void)
{ {
return y; return y; /* Remove 1 swap and use lvx. */
}
vector double
foo1 (void)
{
x = y; /* Remove 2 redundant swaps here. */
return x; /* Remove 1 swap and use lvx. */
}
void __attribute__ ((noinline))
fill_local (vector double *vp)
{
*vp = x; /* Remove 2 redundant swaps here. */
}
/* Test aligned load from local. */
vector double
foo2 (void)
{
vector double v;
/* Need to be clever here because v will normally reside in a
register rather than memory. */
fill_local (&v);
return v; /* Remove 1 swap and use lvx. */
}
/* Test aligned load from pointer. */
vector double
foo3 (vector double *arg)
{
return *arg; /* Remove 1 swap and use lvx. */
}
/* In this structure, the compiler should insert padding to assure
that a_vector is properly aligned. */
struct bar {
short a_field;
vector double a_vector;
};
vector double
foo4 (struct bar *bp)
{
return bp->a_vector; /* Remove 1 swap and use lvx. */
}
/* Test aligned store to global. */
void
baz (vector double arg)
{
x = arg; /* Remove 1 swap and use stvx. */
}
void __attribute__ ((noinline))
copy_local (vector double *arg)
{
x = *arg; /* Remove 2 redundant swaps. */
}
/* Test aligned store to local. */
void
baz1 (vector double arg)
{
vector double v;
/* Need cleverness, because v will normally reside in a register
rather than memory. */
v = arg; /* Aligned store to local: remove 1
swap and use stvx. */
copy_local (&v);
}
/* Test aligned store to pointer. */
void
baz2 (vector double *arg1, vector double arg2)
{
/* Assume arg2 resides in register. */
*arg1 = arg2; /* Remove 1 swap and use stvx. */
}
void
baz3 (struct bar *bp, vector double v)
{
/* Assume v resides in register. */
bp->a_vector = v; /* Remove 1 swap and use stvx. */
} }
int int
main (int argc, char *argv[]) main (double argc, double *argv[])
{ {
vector double fetched_value = foo (); vector double fetched_value = foo ();
if (fetched_value[0] != 0.0 || fetched_value[15] != 0.1) if (fetched_value[0] != 0.1 || fetched_value[1] != 0.2)
abort ();
fetched_value = foo1 ();
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort (); abort ();
else
return 0; fetched_value = foo2 ();
if (fetched_value[0] != 0.1 || fetched_value[1] != 0.2)
abort ();
fetched_value = foo3 (&x);
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort ();
struct bar a_struct;
a_struct.a_vector = x; /* Remove 2 redundant swaps. */
fetched_value = foo4 (&a_struct);
if (fetched_value[1] != 0.2 || fetched_value[0] != 0.1)
abort ();
z[0] = 0.7;
z[1] = 0.6;
baz (z);
if (x[0] != 0.7 || x[1] != 0.6)
abort ();
vector double source = { 0.8, 0.7 };
baz1 (source);
if (x[0] != 0.8 || x[1] != 0.7)
abort ();
vector double dest;
baz2 (&dest, source);
if (dest[0] != 0.8 || dest[1] != 0.7)
abort ();
baz3 (&a_struct, source);
if (a_struct.a_vector[1] != 0.7 || a_struct.a_vector[0] != 0.8)
abort ();
return 0;
} }
...@@ -33,5 +33,7 @@ add_long_1 (vector long *p, long x) ...@@ -33,5 +33,7 @@ add_long_1 (vector long *p, long x)
/* { dg-final { scan-assembler-not "lxvw4x" } } */ /* { dg-final { scan-assembler-not "lxvw4x" } } */
/* { dg-final { scan-assembler-not "lxvx" } } */ /* { dg-final { scan-assembler-not "lxvx" } } */
/* { dg-final { scan-assembler-not "lxv" } } */ /* { dg-final { scan-assembler-not "lxv" } } */
/* { dg-final { scan-assembler-not "lvx" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */ /* With recent enhancements to the code generator, it is considered
* legal to implement vec_extract with lvx and xxpermdi. Previous
* versions of this test forbid both instructions. */
...@@ -22,5 +22,7 @@ add_long_n (vector long *p, long x, long n) ...@@ -22,5 +22,7 @@ add_long_n (vector long *p, long x, long n)
/* { dg-final { scan-assembler-not "lxvw4x" } } */ /* { dg-final { scan-assembler-not "lxvw4x" } } */
/* { dg-final { scan-assembler-not "lxvx" } } */ /* { dg-final { scan-assembler-not "lxvx" } } */
/* { dg-final { scan-assembler-not "lxv" } } */ /* { dg-final { scan-assembler-not "lxv" } } */
/* { dg-final { scan-assembler-not "lvx" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */ /* With recent enhancements to the code generator, it is considered
* legal to implement vec_extract with lvx and xxpermdi. Previous
* versions of this test forbid both instructions. */
...@@ -64,5 +64,7 @@ add_signed_char_n (vector signed char *p, int n) ...@@ -64,5 +64,7 @@ add_signed_char_n (vector signed char *p, int n)
/* { dg-final { scan-assembler-not "lxvw4x" } } */ /* { dg-final { scan-assembler-not "lxvw4x" } } */
/* { dg-final { scan-assembler-not "lxvx" } } */ /* { dg-final { scan-assembler-not "lxvx" } } */
/* { dg-final { scan-assembler-not "lxv" } } */ /* { dg-final { scan-assembler-not "lxv" } } */
/* { dg-final { scan-assembler-not "lvx" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */ /* With recent enhancements to the code generator, it is considered
* legal to implement vec_extract with lvx and xxpermdi. Previous
* versions of this test forbid both instructions. */
...@@ -64,5 +64,7 @@ add_unsigned_char_n (vector unsigned char *p, int n) ...@@ -64,5 +64,7 @@ add_unsigned_char_n (vector unsigned char *p, int n)
/* { dg-final { scan-assembler-not "lxvw4x" } } */ /* { dg-final { scan-assembler-not "lxvw4x" } } */
/* { dg-final { scan-assembler-not "lxvx" } } */ /* { dg-final { scan-assembler-not "lxvx" } } */
/* { dg-final { scan-assembler-not "lxv" } } */ /* { dg-final { scan-assembler-not "lxv" } } */
/* { dg-final { scan-assembler-not "lvx" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */ /* With recent enhancements to the code generator, it is considered
* legal to implement vec_extract with lvx and xxpermdi. Previous
* versions of this test forbid both instructions. */
...@@ -40,5 +40,7 @@ add_float_n (vector float *p, long n) ...@@ -40,5 +40,7 @@ add_float_n (vector float *p, long n)
/* { dg-final { scan-assembler-not "lxvw4x" } } */ /* { dg-final { scan-assembler-not "lxvw4x" } } */
/* { dg-final { scan-assembler-not "lxvx" } } */ /* { dg-final { scan-assembler-not "lxvx" } } */
/* { dg-final { scan-assembler-not "lxv" } } */ /* { dg-final { scan-assembler-not "lxv" } } */
/* { dg-final { scan-assembler-not "lvx" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */ /* With recent enhancements to the code generator, it is considered
* legal to implement vec_extract with lvx and xxpermdi. Previous
* versions of this test forbid both instructions. */
...@@ -40,5 +40,7 @@ add_int_n (vector int *p, int n) ...@@ -40,5 +40,7 @@ add_int_n (vector int *p, int n)
/* { dg-final { scan-assembler-not "lxvw4x" } } */ /* { dg-final { scan-assembler-not "lxvw4x" } } */
/* { dg-final { scan-assembler-not "lxvx" } } */ /* { dg-final { scan-assembler-not "lxvx" } } */
/* { dg-final { scan-assembler-not "lxv" } } */ /* { dg-final { scan-assembler-not "lxv" } } */
/* { dg-final { scan-assembler-not "lvx" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */ /* With recent enhancements to the code generator, it is considered
* legal to implement vec_extract with lvx and xxpermdi. Previous
* versions of this test forbid both instructions. */
...@@ -64,5 +64,7 @@ add_short_n (vector short *p, int n) ...@@ -64,5 +64,7 @@ add_short_n (vector short *p, int n)
/* { dg-final { scan-assembler-not "lxvw4x" } } */ /* { dg-final { scan-assembler-not "lxvw4x" } } */
/* { dg-final { scan-assembler-not "lxvx" } } */ /* { dg-final { scan-assembler-not "lxvx" } } */
/* { dg-final { scan-assembler-not "lxv" } } */ /* { dg-final { scan-assembler-not "lxv" } } */
/* { dg-final { scan-assembler-not "lvx" } } */
/* { dg-final { scan-assembler-not "xxpermdi" } } */ /* With recent enhancements to the code generator, it is considered
* legal to implement vec_extract with lvx and xxpermdi. Previous
* versions of this test forbid both instructions. */
...@@ -7,10 +7,10 @@ ...@@ -7,10 +7,10 @@
/* { dg-final { scan-assembler-times "xvabsdp" 1 } } */ /* { dg-final { scan-assembler-times "xvabsdp" 1 } } */
/* { dg-final { scan-assembler-times "xvadddp" 1 } } */ /* { dg-final { scan-assembler-times "xvadddp" 1 } } */
/* { dg-final { scan-assembler-times "xxlnor" 6 } } */ /* { dg-final { scan-assembler-times "xxlnor" 8 } } */
/* { dg-final { scan-assembler-times "xxlor" 16 } } */ /* { dg-final { scan-assembler-times "xxlor" 30 } } */
/* { dg-final { scan-assembler-times "xvcmpeqdp" 5 } } */ /* { dg-final { scan-assembler-times "xvcmpeqdp" 5 } } */
/* { dg-final { scan-assembler-times "xvcmpgtdp" 7 } } */ /* { dg-final { scan-assembler-times "xvcmpgtdp" 8 } } */
/* { dg-final { scan-assembler-times "xvcmpgedp" 6 } } */ /* { dg-final { scan-assembler-times "xvcmpgedp" 6 } } */
/* { dg-final { scan-assembler-times "xvrdpim" 1 } } */ /* { dg-final { scan-assembler-times "xvrdpim" 1 } } */
/* { dg-final { scan-assembler-times "xvmaddadp" 1 } } */ /* { dg-final { scan-assembler-times "xvmaddadp" 1 } } */
...@@ -26,7 +26,7 @@ ...@@ -26,7 +26,7 @@
/* { dg-final { scan-assembler-times "xvmsubasp" 1 } } */ /* { dg-final { scan-assembler-times "xvmsubasp" 1 } } */
/* { dg-final { scan-assembler-times "xvnmaddasp" 1 } } */ /* { dg-final { scan-assembler-times "xvnmaddasp" 1 } } */
/* { dg-final { scan-assembler-times "vmsumshs" 1 } } */ /* { dg-final { scan-assembler-times "vmsumshs" 1 } } */
/* { dg-final { scan-assembler-times "xxland" 9 } } */ /* { dg-final { scan-assembler-times "xxland" 13 } } */
/* Source code for the test in vsx-vector-6.h */ /* Source code for the test in vsx-vector-6.h */
#include "vsx-vector-6.h" #include "vsx-vector-6.h"
2018-01-10 Kelvin Nilsen <kelvin@gcc.gnu.org>
* lex.c (search_line_fast): Remove illegal coercion of an
unaligned pointer value to vector pointer type and replace with
use of __builtin_vec_vsx_ld () built-in function, which operates
on unaligned pointer values.
2018-01-03 Jakub Jelinek <jakub@redhat.com> 2018-01-03 Jakub Jelinek <jakub@redhat.com>
Update copyright years. Update copyright years.
......
...@@ -568,7 +568,7 @@ search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) ...@@ -568,7 +568,7 @@ search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
{ {
vc m_nl, m_cr, m_bs, m_qm; vc m_nl, m_cr, m_bs, m_qm;
data = *((const vc *)s); data = __builtin_vec_vsx_ld (0, s);
s += 16; s += 16;
m_nl = (vc) __builtin_vec_cmpeq(data, repl_nl); m_nl = (vc) __builtin_vec_cmpeq(data, repl_nl);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment