[6/7] Explicitly classify vector loads and stores

This is the main patch in the series. It adds a new enum and routines for classifying a vector load or store implementation. Originally there were three motivations: (1) Reduce cut-&-paste (2) Make the chosen vectorisation strategy more obvious. At the moment this is derived implicitly from various other bits of state (GROUPED, STRIDED, SLP, etc.) (3) Decouple the vectorisation strategy from those other bits of state, so that there can be a choice of implementation for a given scalar statement. The specific problem here is that we class: for (...) { ... = a[i * x]; ... = a[i * x + 1]; } as "strided and grouped" but: for (...) { ... = a[i * 7]; ... = a[i * 7 + 1]; } as "non-strided and grouped". Before the patch, "strided and grouped" loads would always try to use separate scalar loads while "non-strided and grouped" loads would always try to use load-and-permute. But load-and-permute is never supported for a group size of 7, so the effect was that the first loop was vectorisable and the second wasn't. It seemed odd that not knowing x (but accepting it could be 7) would allow more optimisation opportunities than knowing x is 7. Unfortunately, it looks like we underestimate the cost of separate scalar accesses on at least aarch64, so I've disabled (3) for now; see the "if" statement at the end of get_load_store_type. I think the patch still does (1) and (2), so that's the justification for it in its current form. It also means that (3) is now simply a case of removing the FIXME code, once the cost model problems have been sorted out. (I did wonder about adding a --param, but that seems overkill. I hope to get back to this during GCC 7 stage 1.) Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vectorizer.h (vect_memory_access_type): New enum. (_stmt_vec_info): Add a memory_access_type field. (STMT_VINFO_MEMORY_ACCESS_TYPE): New macro. (vect_model_store_cost): Take an access type instead of a boolean. (vect_model_load_cost): Likewise. * tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to vect_model_store_cost and vect_model_load_cost. * tree-vect-stmts.c (vec_load_store_type): New enum. (vect_model_store_cost): Take an access type instead of a store_lanes_p boolean. Simplify tests. (vect_model_load_cost): Likewise, but for load_lanes_p. (get_group_load_store_type, get_load_store_type): New functions. (vectorizable_store): Use get_load_store_type. Record the access type in STMT_VINFO_MEMORY_ACCESS_TYPE. (vectorizable_load): Likewise. (vectorizable_mask_load_store): Likewise. Replace is_store variable with vls_type. From-SVN: r238038

[6/7] Explicitly classify vector loads and stores
This is the main patch in the series. It adds a new enum and routines for classifying a vector load or store implementation. Originally there were three motivations: (1) Reduce cut-&-paste (2) Make the chosen vectorisation strategy more obvious. At the moment this is derived implicitly from various other bits of state (GROUPED, STRIDED, SLP, etc.) (3) Decouple the vectorisation strategy from those other bits of state, so that there can be a choice of implementation for a given scalar statement. The specific problem here is that we class: for (...) { ... = a[i * x]; ... = a[i * x + 1]; } as "strided and grouped" but: for (...) { ... = a[i * 7]; ... = a[i * 7 + 1]; } as "non-strided and grouped". Before the patch, "strided and grouped" loads would always try to use separate scalar loads while "non-strided and grouped" loads would always try to use load-and-permute. But load-and-permute is never supported for a group size of 7, so the effect was that the first loop was vectorisable and the second wasn't. It seemed odd that not knowing x (but accepting it could be 7) would allow more optimisation opportunities than knowing x is 7. Unfortunately, it looks like we underestimate the cost of separate scalar accesses on at least aarch64, so I've disabled (3) for now; see the "if" statement at the end of get_load_store_type. I think the patch still does (1) and (2), so that's the justification for it in its current form. It also means that (3) is now simply a case of removing the FIXME code, once the cost model problems have been sorted out. (I did wonder about adding a --param, but that seems overkill. I hope to get back to this during GCC 7 stage 1.) Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vectorizer.h (vect_memory_access_type): New enum. (_stmt_vec_info): Add a memory_access_type field. (STMT_VINFO_MEMORY_ACCESS_TYPE): New macro. (vect_model_store_cost): Take an access type instead of a boolean. (vect_model_load_cost): Likewise. * tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to vect_model_store_cost and vect_model_load_cost. * tree-vect-stmts.c (vec_load_store_type): New enum. (vect_model_store_cost): Take an access type instead of a store_lanes_p boolean. Simplify tests. (vect_model_load_cost): Likewise, but for load_lanes_p. (get_group_load_store_type, get_load_store_type): New functions. (vectorizable_store): Use get_load_store_type. Record the access type in STMT_VINFO_MEMORY_ACCESS_TYPE. (vectorizable_load): Likewise. (vectorizable_mask_load_store): Likewise. Replace is_store variable with vls_type. From-SVN: r238038
2de001ee · Richard Sandiford · Richard Sandiford · 4fb8ba9d · 2de001ee · 2de001ee
Commit 2de001ee authored Jul 06, 2016 by Richard Sandiford Committed by Richard Sandiford Jul 06, 2016
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 64 additions and 7 deletions

gcc/ChangeLog
+20 -0

gcc/tree-vect-slp.c
+9 -4

gcc/tree-vect-stmts.c
+0 -0

gcc/tree-vectorizer.h
+35 -3

No files found.
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
 2016-07-06  Richard Sandiford  <richard.sandiford@arm.com>
+	* tree-vectorizer.h (vect_memory_access_type): New enum.
+	(_stmt_vec_info): Add a memory_access_type field.
+	(STMT_VINFO_MEMORY_ACCESS_TYPE): New macro.
+	(vect_model_store_cost): Take an access type instead of a boolean.
+	(vect_model_load_cost): Likewise.
+	* tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to
+	vect_model_store_cost and vect_model_load_cost.
+	* tree-vect-stmts.c (vec_load_store_type): New enum.
+	(vect_model_store_cost): Take an access type instead of a
+	store_lanes_p boolean.  Simplify tests.
+	(vect_model_load_cost): Likewise, but for load_lanes_p.
+	(get_group_load_store_type, get_load_store_type): New functions.
+	(vectorizable_store): Use get_load_store_type.  Record the access
+	type in STMT_VINFO_MEMORY_ACCESS_TYPE.
+	(vectorizable_load): Likewise.
+	(vectorizable_mask_load_store): Likewise.  Replace is_store
+	variable with vls_type.
+2016-07-06  Richard Sandiford  <richard.sandiford@arm.com>
 	* tree-vectorizer.h (vect_grouped_load_supported): Add a
 	single_element_p parameter.
 	* tree-vect-data-refs.c (vect_grouped_load_supported): Likewise.

--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1490,9 +1490,13 @@ vect_analyze_slp_cost_1 (slp_instance instance, slp_tree node,
  stmt_info = vinfo_for_stmt (stmt);
  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
    {
+      vect_memory_access_type memory_access_type
+	= (STMT_VINFO_STRIDED_P (stmt_info)
+	   ? VMAT_STRIDED_SLP
+	   : VMAT_CONTIGUOUS);
      if (DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)))
-	vect_model_store_cost (stmt_info, ncopies_for_cost, false,
+	vect_model_store_cost (stmt_info, ncopies_for_cost,
-			       vect_uninitialized_def,
+			       memory_access_type, vect_uninitialized_def,
 			       node, prologue_cost_vec, body_cost_vec);
      else
 	{
@@ -1515,8 +1519,9 @@ vect_analyze_slp_cost_1 (slp_instance instance, slp_tree node,
 	      ncopies_for_cost *= SLP_INSTANCE_UNROLLING_FACTOR (instance);
 	    }
 	  /* Record the cost for the vector loads.  */
-	  vect_model_load_cost (stmt_info, ncopies_for_cost, false,
+	  vect_model_load_cost (stmt_info, ncopies_for_cost,
-				node, prologue_cost_vec, body_cost_vec);
+				memory_access_type, node, prologue_cost_vec,
+				body_cost_vec);
 	  return;
 	}
    }

--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -481,6 +481,33 @@ enum slp_vect_type {
  hybrid
 };
+/* Describes how we're going to vectorize an individual load or store,
+   or a group of loads or stores.  */
+enum vect_memory_access_type {
+  /* A simple contiguous access.  */
+  VMAT_CONTIGUOUS,
+  /* A simple contiguous access in which the elements need to be permuted
+     after loading or before storing.  Only used for loop vectorization;
+     SLP uses separate permutes.  */
+  VMAT_CONTIGUOUS_PERMUTE,
+  /* An access that uses IFN_LOAD_LANES or IFN_STORE_LANES.  */
+  VMAT_LOAD_STORE_LANES,
+  /* An access in which each scalar element is loaded or stored
+     individually.  */
+  VMAT_ELEMENTWISE,
+  /* A hybrid of VMAT_CONTIGUOUS and VMAT_ELEMENTWISE, used for grouped
+     SLP accesses.  Each unrolled iteration uses a contiguous load
+     or store for the whole group, but the groups from separate iterations
+     are combined in the same way as for VMAT_ELEMENTWISE.  */
+  VMAT_STRIDED_SLP,
+  /* The access uses gather loads or scatter stores.  */
+  VMAT_GATHER_SCATTER
+};
 typedef struct data_reference *dr_p;
@@ -598,6 +625,10 @@ typedef struct _stmt_vec_info {
  /* True if this is an access with loop-invariant stride.  */
  bool strided_p;
+  /* Classifies how the load or store is going to be implemented
+     for loop vectorization.  */
+  vect_memory_access_type memory_access_type;
  /* For both loads and stores.  */
  bool simd_lane_access_p;
@@ -655,6 +686,7 @@ STMT_VINFO_BB_VINFO (stmt_vec_info stmt_vinfo)
 #define STMT_VINFO_DATA_REF(S)             (S)->data_ref_info
 #define STMT_VINFO_GATHER_SCATTER_P(S)	   (S)->gather_scatter_p
 #define STMT_VINFO_STRIDED_P(S)	   	   (S)->strided_p
+#define STMT_VINFO_MEMORY_ACCESS_TYPE(S)   (S)->memory_access_type
 #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)   (S)->simd_lane_access_p
 #define STMT_VINFO_VEC_REDUCTION_TYPE(S)   (S)->v_reduc_type
@@ -1002,12 +1034,12 @@ extern void free_stmt_vec_info (gimple *stmt);
 extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
                                    stmt_vector_for_cost *,
 				    stmt_vector_for_cost *);
-extern void vect_model_store_cost (stmt_vec_info, int, bool,
+extern void vect_model_store_cost (stmt_vec_info, int, vect_memory_access_type,
 				   enum vect_def_type, slp_tree,
 				   stmt_vector_for_cost *,
 				   stmt_vector_for_cost *);
-extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
+extern void vect_model_load_cost (stmt_vec_info, int, vect_memory_access_type,
-				  stmt_vector_for_cost *,
+				  slp_tree, stmt_vector_for_cost *,
 				  stmt_vector_for_cost *);
 extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
 				  enum vect_cost_for_stmt, stmt_vec_info,