i386: prefer vpermilpd over vpermpd [PR93395]
In Agner Fog's tables, vpermilp[sd] with immediates seem to be much faster than vpermpd with immediate, for a good reason, the former only permute something within the lanes and don't do anything intra-lane, while vpermpd can. So, functionality-wise, vpermilpd is more efficient subset of vpermpd. We use the same RTL for those though (and also for certain broadcast). Now, the problem was that the vpermpd pattern appeared first in sse.md, followed by the broadcast patterns, followed by the vpermilp[sd]. Which means unless -mavx -mno-avx2, we'd emit vpermpd instead of the more efficient alternatives. The following patch reorders them, so that vpermpd comes last, if we can match a broadcast, we do, if we can match a vpermilp[sd] that is not a broadcast, we will, otherwise fall back (of course only if -mavx2) to vpermpd. 2020-01-24 Jakub Jelinek <jakub@redhat.com> PR target/93395 * config/i386/sse.md (*avx_vperm_broadcast_v4sf, *avx_vperm_broadcast_<mode>, <sse2_avx_avx512f>_vpermil<mode><mask_name>, *<sse2_avx_avx512f>_vpermilp<mode><mask_name>): Move before avx2_perm<mode>/avx512f_perm<mode>. * gcc.target/i386/pr93395.c: New test. * gcc.target/i386/avx512vl-vpermilpdi-1.c: Remove xfail.
Showing
This diff is collapsed.
Click to expand it.
gcc/testsuite/gcc.target/i386/pr93395.c
0 → 100644
Please
register
or
sign in
to comment