Commit 5d782a8d by Jakub Jelinek

i386: prefer vpermilpd over vpermpd [PR93395]

In Agner Fog's tables, vpermilp[sd] with immediates seem to be
much faster than vpermpd with immediate, for a good reason,
the former only permute something within the lanes and don't do anything
intra-lane, while vpermpd can.  So, functionality-wise, vpermilpd
is more efficient subset of vpermpd.  We use the same RTL for those
though (and also for certain broadcast).

Now, the problem was that the vpermpd pattern appeared first in sse.md,
followed by the broadcast patterns, followed by the vpermilp[sd].
Which means unless -mavx -mno-avx2, we'd emit vpermpd instead of the
more efficient alternatives.

The following patch reorders them, so that vpermpd comes last, if we
can match a broadcast, we do, if we can match a vpermilp[sd] that is not a
broadcast, we will, otherwise fall back (of course only if -mavx2) to
vpermpd.

2020-01-24  Jakub Jelinek  <jakub@redhat.com>

	PR target/93395
	* config/i386/sse.md (*avx_vperm_broadcast_v4sf,
	*avx_vperm_broadcast_<mode>,
	<sse2_avx_avx512f>_vpermil<mode><mask_name>,
	*<sse2_avx_avx512f>_vpermilp<mode><mask_name>):
	Move before avx2_perm<mode>/avx512f_perm<mode>.

	* gcc.target/i386/pr93395.c: New test.
	* gcc.target/i386/avx512vl-vpermilpdi-1.c: Remove xfail.
parent 14e5881e
2020-01-24 Jakub Jelinek <jakub@redhat.com>
PR target/93395
* config/i386/sse.md (*avx_vperm_broadcast_v4sf,
*avx_vperm_broadcast_<mode>,
<sse2_avx_avx512f>_vpermil<mode><mask_name>,
*<sse2_avx_avx512f>_vpermilp<mode><mask_name>):
Move before avx2_perm<mode>/avx512f_perm<mode>.
PR target/93376
* simplify-rtx.c (simplify_const_unary_operation,
simplify_const_binary_operation): Punt for mode precision above
......
2020-01-24 Jakub Jelinek <jakub@redhat.com>
PR target/93395
* gcc.target/i386/pr93395.c: New test.
* gcc.target/i386/avx512vl-vpermilpdi-1.c: Remove xfail.
2020-01-24 Marek Polacek <polacek@redhat.com>
PR c++/93299 - ICE in tsubst_copy with parenthesized expression.
......
/* { dg-do compile } */
/* { dg-options "-mavx512vl -O2" } */
/* { dg-final { scan-assembler-times "vpermilpd\[ \\t\]+\[^\{\n\]*13\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times "vpermilpd\[ \\t\]+\[^\{\n\]*13\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times "vpermilpd\[ \\t\]+\[^\{\n\]*13\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
/* { dg-final { scan-assembler-times "vpermilpd\[ \\t\]+\[^\{\n\]*13\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
/* { dg-final { scan-assembler-times "vpermilpd\[ \\t\]+\[^\{\n\]*3\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
/* { dg-final { scan-assembler-times "vpermilpd\[ \\t\]+\[^\{\n\]*3\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
......
/* PR target/93395 */
/* { dg-do compile } */
/* { dg-options "-O2 -mavx512f -masm=att" } */
/* { dg-final { scan-assembler-times "vpermilpd\t.5, %ymm" 3 } } */
/* { dg-final { scan-assembler-times "vpermilpd\t.85, %zmm" 3 } } */
/* { dg-final { scan-assembler-not "vpermpd\t" } } */
#include <immintrin.h>
__m256d
foo1 (__m256d a)
{
return _mm256_permute4x64_pd (a, 177);
}
__m256d
foo2 (__m256d a)
{
return _mm256_permute_pd (a, 5);
}
__m256d
foo3 (__m256d a)
{
return __builtin_shuffle (a, (__v4di) { 1, 0, 3, 2 });
}
__m512d
foo4 (__m512d a)
{
return _mm512_permutex_pd (a, 177);
}
__m512d
foo5 (__m512d a)
{
return _mm512_permute_pd (a, 85);
}
__m512d
foo6 (__m512d a)
{
return __builtin_shuffle (a, (__v8di) { 1, 0, 3, 2, 5, 4, 7, 6 });
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment