[AArch64] Fix handling of npatterns>1 constants for partial SVE modes

For partial SVE vectors of element X, we want to treat duplicates of single X elements in the same way as for full vectors of X. But if a constant instead contains a repeating pattern of X elements, the transition from one value to the next must happen at container boundaries rather than element boundaries. E.g. a VNx4HI should in that case contain the same number of constants as a VNx4SI. Fixing this means that we need a reinterpret from the container-based mode to the partial mode; e.g. in the above example we need a reinterpret from VNx4SI to VNx4HI. We can't use subregs for that because they're forbidden by aarch64_can_change_class_mode; we should handle them in the same way as for big-endian instead. 2019-12-19 Richard Sandiford <richard.sandiford@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): When handling partial SVE vectors, use the container mode rather than the element mode if the constant isn't a single-element duplicate. * config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret<mode>): Check targetm.can_change_mode_class instead of BYTES_BIG_ENDIAN. gcc/testsuite/ * gcc.target/aarch64/sve/mixed_size_9.c: New test. From-SVN: r279580

[AArch64] Fix handling of npatterns>1 constants for partial SVE modes
For partial SVE vectors of element X, we want to treat duplicates of single X elements in the same way as for full vectors of X. But if a constant instead contains a repeating pattern of X elements, the transition from one value to the next must happen at container boundaries rather than element boundaries. E.g. a VNx4HI should in that case contain the same number of constants as a VNx4SI. Fixing this means that we need a reinterpret from the container-based mode to the partial mode; e.g. in the above example we need a reinterpret from VNx4SI to VNx4HI. We can't use subregs for that because they're forbidden by aarch64_can_change_class_mode; we should handle them in the same way as for big-endian instead. 2019-12-19 Richard Sandiford <richard.sandiford@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): When handling partial SVE vectors, use the container mode rather than the element mode if the constant isn't a single-element duplicate. * config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret<mode>): Check targetm.can_change_mode_class instead of BYTES_BIG_ENDIAN. gcc/testsuite/ * gcc.target/aarch64/sve/mixed_size_9.c: New test. From-SVN: r279580
b23c6a2c · Richard Sandiford · Richard Sandiford · 3561caa2 · b23c6a2c · b23c6a2c
Commit b23c6a2c authored Dec 19, 2019 by Richard Sandiford Committed by Richard Sandiford Dec 19, 2019
Showing with 51 additions and 4 deletions

gcc/ChangeLog
+8 -0

gcc/config/aarch64/aarch64-sve.md
+2 -1

gcc/config/aarch64/aarch64.c
+19 -3

gcc/testsuite/ChangeLog
+4 -0

gcc/testsuite/gcc.target/aarch64/sve/mixed_size_9.c
+18 -0

No files found.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
+2019-12-19  Richard Sandiford  <richard.sandiford@arm.com>
+
+	* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): When
+	handling partial SVE vectors, use the container mode rather than
+	the element mode if the constant isn't a single-element duplicate.
+	* config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret<mode>):
+	Check targetm.can_change_mode_class instead of BYTES_BIG_ENDIAN.
+
 2019-12-19  Andrew Stubbs  <ams@codesourcery.com>

 	* config/gcn/gcn-valu.md (addv64si3<exec_clobber>): Rename to ...
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -694,7 +694,8 @@
 	  UNSPEC_REINTERPRET))]
  "TARGET_SVE"
  {
-    if (!BYTES_BIG_ENDIAN)
+    machine_mode src_mode = GET_MODE (operands[1]);
+    if (targetm.can_change_mode_class (<MODE>mode, src_mode, FP_REGS))
      {
 	emit_move_insn (operands[0], gen_lowpart (<MODE>mode, operands[1]));
 	DONE;

--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16826,12 +16826,28 @@ aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info,
 	}
    }

-  unsigned int elt_size = GET_MODE_SIZE (elt_mode);
+  /* If all elements in an SVE vector have the same value, we have a free
+     choice between using the element mode and using the container mode.
+     Using the element mode means that unused parts of the vector are
+     duplicates of the used elements, while using the container mode means
+     that the unused parts are an extension of the used elements.  Using the
+     element mode is better for (say) VNx4HI 0x101, since 0x01010101 is valid
+     for its container mode VNx4SI while 0x00000101 isn't.
+
+     If not all elements in an SVE vector have the same value, we need the
+     transition from one element to the next to occur at container boundaries.
+     E.g. a fixed-length VNx4HI containing { 1, 2, 3, 4 } should be treated
+     in the same way as a VNx4SI containing { 1, 2, 3, 4 }.  */
+  scalar_int_mode elt_int_mode;
+  if ((vec_flags & VEC_SVE_DATA) && n_elts > 1)
+    elt_int_mode = aarch64_sve_container_int_mode (mode);
+  else
+    elt_int_mode = int_mode_for_mode (elt_mode).require ();
+
+  unsigned int elt_size = GET_MODE_SIZE (elt_int_mode);
  if (elt_size > 8)
    return false;

-  scalar_int_mode elt_int_mode = int_mode_for_mode (elt_mode).require ();
-
  /* Expand the vector constant out into a byte vector, with the least
     significant byte of the register first.  */
  auto_vec<unsigned char, 16> bytes;

--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
 2019-12-19  Richard Sandiford  <richard.sandiford@arm.com>

+	* gcc.target/aarch64/sve/mixed_size_9.c: New test.
+
+2019-12-19  Richard Sandiford  <richard.sandiford@arm.com>
+
 	* gcc.target/aarch64/sve/mixed_size_8.c: New test.

 2019-12-19  Richard Sandiford  <richard.sandiford@arm.com>

--- a/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_9.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_9.c
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -msve-vector-bits=256" } */
+/* Originally from gcc.dg/vect/pr88598-4.c.  */
+
+#define N 4
+
+int a[N];
+
+int __attribute__ ((noipa))
+f2 (void)
+{
+  int b[N] = { 0, 31, 0, 31 }, res = 0;
+  for (int i = 0; i < N; ++i)
+    res += a[i] & b[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]\.d, #} } } */
+/* { dg-final { scan-assembler-not {\tstr\tz[0-9],} } } */