Don't use permutes for single-element accesses (PR83753)

After cunrolling the inner loop, the remaining loop in the testcase has a single 32-bit access and a group of 64-bit accesses. We first try to vectorise at 128 bits (VF 4), but decide not to for cost reasons. We then try with 64 bits (VF 2) instead. This means that the group of 64-bit accesses uses a single-element vector, which is deliberately supported as of r251538. We then try to create "permutes" for these single-element vectors and fall foul of: for (i = 0; i < 6; i++) sel[i] += exact_div (nelt, 2); in vect_grouped_store_supported, since nelt==1. Maybe we shouldn't even be trying to vectorise statements in the single-element case, and instead just copy the scalar statement for each member of the group. But until then, this patch treats non-strided grouped accesses as VMAT_CONTIGUOUS if no permutation is necessary. 2018-01-10 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/83753 * tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS for non-strided grouped accesses if the number of elements is 1. gcc/testsuite/ PR tree-optimization/83753 * gcc.dg/torture/pr83753.c: New test. From-SVN: r256427

Don't use permutes for single-element accesses (PR83753)
After cunrolling the inner loop, the remaining loop in the testcase has a single 32-bit access and a group of 64-bit accesses. We first try to vectorise at 128 bits (VF 4), but decide not to for cost reasons. We then try with 64 bits (VF 2) instead. This means that the group of 64-bit accesses uses a single-element vector, which is deliberately supported as of r251538. We then try to create "permutes" for these single-element vectors and fall foul of: for (i = 0; i < 6; i++) sel[i] += exact_div (nelt, 2); in vect_grouped_store_supported, since nelt==1. Maybe we shouldn't even be trying to vectorise statements in the single-element case, and instead just copy the scalar statement for each member of the group. But until then, this patch treats non-strided grouped accesses as VMAT_CONTIGUOUS if no permutation is necessary. 2018-01-10 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/83753 * tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS for non-strided grouped accesses if the number of elements is 1. gcc/testsuite/ PR tree-optimization/83753 * gcc.dg/torture/pr83753.c: New test. From-SVN: r256427
6737facb · Richard Sandiford · Richard Sandiford · e10e60cb · 6737facb · 6737facb
Commit 6737facb authored Jan 10, 2018 by Richard Sandiford Committed by Richard Sandiford Jan 10, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 40 additions and 4 deletions

gcc/ChangeLog
+6 -0

gcc/testsuite/ChangeLog
+5 -0

gcc/testsuite/gcc.dg/torture/pr83753.c
+19 -0

gcc/tree-vect-stmts.c
+10 -4

No files found.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
+2018-01-10  Richard Sandiford  <richard.sandiford@linaro.org>
+	PR tree-optimization/83753
+	* tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS
+	for non-strided grouped accesses if the number of elements is 1.
 2018-01-10  Jan Hubicka  <hubicka@ucw.cz>
 	PR target/81616

--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
+2018-01-10  Richard Sandiford  <richard.sandiford@linaro.org>
+	PR tree-optimization/83753
+	* gcc.dg/torture/pr83753.c: New test.
 2018-01-09  Jan Hubicka  <hubicka@ucw.cz>
 	* gcc.target/i386/avx2-gather-1.c: Add -march.

--- a/gcc/testsuite/gcc.dg/torture/pr83753.c
+++ b/gcc/testsuite/gcc.dg/torture/pr83753.c
+/* { dg-do compile } */
+/* { dg-options "-mcpu=xgene1" { target aarch64*-*-* } } */
+typedef struct {
+  int m1[10];
+  double m2[10][8];
+} blah;
+void
+foo (blah *info) {
+  int i, d;
+  for (d=0; d<10; d++) {
+    info->m1[d] = 0;
+    info->m2[d][0] = 1;
+    for (i=1; i<8; i++)
+      info->m2[d][i] = 2;
+  }
+}
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1849,10 +1849,16 @@ get_group_load_store_type (gimple *stmt, tree vectype, bool slp,
 	  && (can_overrun_p || !would_overrun_p)
 	  && compare_step_with_zero (stmt) > 0)
 	{
-	  /* First try using LOAD/STORE_LANES.  */
+	  /* First cope with the degenerate case of a single-element
-	  if (vls_type == VLS_LOAD
+	     vector.  */
-	      ? vect_load_lanes_supported (vectype, group_size)
+	  if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
-	      : vect_store_lanes_supported (vectype, group_size))
+	    *memory_access_type = VMAT_CONTIGUOUS;
+	  /* Otherwise try using LOAD/STORE_LANES.  */
+	  if (*memory_access_type == VMAT_ELEMENTWISE
+	      && (vls_type == VLS_LOAD
+		  ? vect_load_lanes_supported (vectype, group_size)
+		  : vect_store_lanes_supported (vectype, group_size)))
 	    {
 	      *memory_access_type = VMAT_LOAD_STORE_LANES;
 	      overrun_p = would_overrun_p;