Commit 8179efe0 by Richard Sandiford Committed by Richard Sandiford

[AArch64] Prefer LD1RQ for big-endian SVE

This patch deals with cases in which a CONST_VECTOR contains a
repeating bit pattern that is wider than one element but narrower
than 128 bits.  The current code:

* treats the repeating pattern as a single element
* uses the associated LD1R to load and replicate it (such as LD1RD
  for 64-bit patterns)
* uses a subreg to cast the result back to the original vector type

The problem is that for big-endian targets, the final cast is
effectively a form of element reverse.  E.g. say we're using LD1RD to load
16-bit elements, with h being the high parts and l being the low parts:

                               +-----+-----+-----+-----+-----+----
                         lanes |  0  |  1  |  2  |  3  |  4  | ...
                               +-----+-----+-----+-----+-----+----
     memory              bytes |h0 l0 h1 l1 h2 l2 h3 l3 h0 l0 ....
                               +----------------------------------
                                 V  V  V  V  V  V  V  V
                     ----------+-----------------------+
    register         ....      |           0           |
     after           ----------+-----------------------+  lsb
     LD1RD           .... h3 l3 h0 l0 h1 l1 h2 l2 h3 l3|
                     ----------------------------------+

                     ----+-----+-----+-----+-----+-----+
    expected         ... |  4  |  3  |  2  |  1  |  0  |
    register         ----+-----+-----+-----+-----+-----+  lsb
    contents         .... h0 l0 h3 l3 h2 l2 h1 l1 h0 l0|
                     ----------------------------------+

A later patch fixes the handling of general subregs to account
for this, but it means that we need to do a REV instruction
after the load.  It seems better to use LD1RQ[BHW] on a 128-bit
pattern instead, since that gets the endianness right without
a separate fixup instruction.

2018-02-01  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer
	the TImode handling for big-endian targets.

gcc/testsuite/
	* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead
	of LD1R[HWD] for multi-element constants on big-endian targets.
	* gcc.target/aarch64/sve/slp_3.c: Likewise.
	* gcc.target/aarch64/sve/slp_4.c: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>

From-SVN: r257288
parent 947b1372
2018-02-01 Richard Sandiford <richard.sandiford@linaro.org>
* config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer
the TImode handling for big-endian targets.
2018-02-01 Richard Sandiford <richard.sandiford@linaro.org>
* config/aarch64/aarch64-sve.md (sve_ld1rq): Replace with...
(*sve_ld1rq<Vesize>): ... this new pattern. Handle all element sizes,
not just bytes.
......
......@@ -2824,10 +2824,18 @@ aarch64_expand_sve_const_vector (rtx dest, rtx src)
/* The constant is a repeating seqeuence of at least two elements,
where the repeating elements occupy no more than 128 bits.
Get an integer representation of the replicated value. */
unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
gcc_assert (int_bits <= 128);
scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
scalar_int_mode int_mode;
if (BYTES_BIG_ENDIAN)
/* For now, always use LD1RQ to load the value on big-endian
targets, since the handling of smaller integers includes a
subreg that is semantically an element reverse. */
int_mode = TImode;
else
{
unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
gcc_assert (int_bits <= 128);
int_mode = int_mode_for_size (int_bits, 0).require ();
}
rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0);
if (int_value
&& aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value))
......
2018-02-01 Richard Sandiford <richard.sandiford@linaro.org>
* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead
of LD1R[HWD] for multi-element constants on big-endian targets.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
2018-02-01 Richard Sandiford <richard.sandiford@linaro.org>
* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQD rather than LD1RQB.
* gcc.target/aarch64/sve/slp_3.c: Expect LD1RQW rather than LD1RQB.
* gcc.target/aarch64/sve/slp_4.c: Expect LD1RQH rather than LD1RQB.
......
......@@ -29,9 +29,12 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \
TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */
/* { dg-final { scan-assembler-not {\tzip1\t} } } */
/* { dg-final { scan-assembler-not {\tzip2\t} } } */
......
......@@ -32,9 +32,12 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \
TEST_ALL (VEC_PERM)
/* 1 for each 8-bit type. */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */
/* 1 for each 16-bit type and 4 for double. */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 { target aarch64_big_endian } } } */
/* 1 for each 32-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */
......
......@@ -36,7 +36,9 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \
TEST_ALL (VEC_PERM)
/* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double. */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 20 { target aarch64_big_endian } } } */
/* 1 for each 16-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment