1. 06 Jul, 2016 9 commits
    • [6/7] Explicitly classify vector loads and stores · 2de001ee
      This is the main patch in the series.  It adds a new enum and routines
      for classifying a vector load or store implementation.
      
      Originally there were three motivations:
      
            (1) Reduce cut-&-paste
      
            (2) Make the chosen vectorisation strategy more obvious.  At the
                moment this is derived implicitly from various other bits of
                state (GROUPED, STRIDED, SLP, etc.)
      
            (3) Decouple the vectorisation strategy from those other bits of state,
                so that there can be a choice of implementation for a given scalar
                statement.  The specific problem here is that we class:
      
                    for (...)
                      {
                        ... = a[i * x];
                        ... = a[i * x + 1];
                      }
      
                as "strided and grouped" but:
      
                    for (...)
                      {
                        ... = a[i * 7];
                        ... = a[i * 7 + 1];
                      }
      
                as "non-strided and grouped".  Before the patch, "strided and
                grouped" loads would always try to use separate scalar loads
                while "non-strided and grouped" loads would always try to use
                load-and-permute.  But load-and-permute is never supported for
                a group size of 7, so the effect was that the first loop was
                vectorisable and the second wasn't.  It seemed odd that not
                knowing x (but accepting it could be 7) would allow more
                optimisation opportunities than knowing x is 7.
      
      Unfortunately, it looks like we underestimate the cost of separate
      scalar accesses on at least aarch64, so I've disabled (3) for now;
      see the "if" statement at the end of get_load_store_type.  I think
      the patch still does (1) and (2), so that's the justification for
      it in its current form.  It also means that (3) is now simply a
      case of removing the FIXME code, once the cost model problems have
      been sorted out.  (I did wonder about adding a --param, but that
      seems overkill.  I hope to get back to this during GCC 7 stage 1.)
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.
      
      gcc/
      	* tree-vectorizer.h (vect_memory_access_type): New enum.
      	(_stmt_vec_info): Add a memory_access_type field.
      	(STMT_VINFO_MEMORY_ACCESS_TYPE): New macro.
      	(vect_model_store_cost): Take an access type instead of a boolean.
      	(vect_model_load_cost): Likewise.
      	* tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to
      	vect_model_store_cost and vect_model_load_cost.
      	* tree-vect-stmts.c (vec_load_store_type): New enum.
      	(vect_model_store_cost): Take an access type instead of a
      	store_lanes_p boolean.  Simplify tests.
      	(vect_model_load_cost): Likewise, but for load_lanes_p.
      	(get_group_load_store_type, get_load_store_type): New functions.
      	(vectorizable_store): Use get_load_store_type.  Record the access
      	type in STMT_VINFO_MEMORY_ACCESS_TYPE.
      	(vectorizable_load): Likewise.
      	(vectorizable_mask_load_store): Likewise.  Replace is_store
      	variable with vls_type.
      
      From-SVN: r238038
      Richard Sandiford committed
    • [5/7] Move the fix for PR65518 · 4fb8ba9d
      This patch moves the fix for PR65518 to the code that checks whether
      load-and-permute operations are supported.   If the group size is
      greater than the vectorisation factor, it would still be possible
      to fall back to elementwise loads (as for strided groups) rather
      than fail vectorisation entirely.
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.
      
      gcc/
      	* tree-vectorizer.h (vect_grouped_load_supported): Add a
      	single_element_p parameter.
      	* tree-vect-data-refs.c (vect_grouped_load_supported): Likewise.
      	Check the PR65518 case here rather than in vectorizable_load.
      	* tree-vect-loop.c (vect_analyze_loop_2): Update call accordignly.
      	* tree-vect-stmts.c (vectorizable_load): Likewise.
      
      From-SVN: r238037
      Richard Sandiford committed
    • [4/7] Add a gather_scatter_info structure · 134c85ca
      This patch just refactors the gather/scatter support so that all
      information is in a single structure, rather than separate variables.
      This reduces the number of arguments to a function added in patch 6.
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.
      
      gcc/
      	* tree-vectorizer.h (gather_scatter_info): New structure.
      	(vect_check_gather_scatter): Return a bool rather than a decl.
      	Replace return-by-pointer arguments with a single
      	gather_scatter_info *.
      	* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
      	(vect_analyze_data_refs): Update call accordingly.
      	* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
      	(vectorizable_mask_load_store): Likewise.  Also record the
      	offset dt and vectype in the gather_scatter_info.
      	(vectorizable_store): Likewise.
      	(vectorizable_load): Likewise.
      
      From-SVN: r238036
      Richard Sandiford committed
    • [3/7] Fix load/store costs for strided groups · 071e8018
      vect_model_store_cost had:
      
            /* Costs of the stores.  */
            if (STMT_VINFO_STRIDED_P (stmt_info)
                && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
              {
                /* N scalar stores plus extracting the elements.  */
                inside_cost += record_stmt_cost (body_cost_vec,
      				       ncopies * TYPE_VECTOR_SUBPARTS (vectype),
      				       scalar_store, stmt_info, 0, vect_body);
      
      But non-SLP strided groups also use individual scalar stores rather than
      vector stores, so I think we should skip this only for SLP groups.
      
      The same applies to vect_model_load_cost.
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.
      
      gcc/
      	* tree-vect-stmts.c (vect_model_store_cost): For non-SLP
      	strided groups, use the cost of N scalar accesses instead
      	of ncopies vector accesses.
      	(vect_model_load_cost): Likewise.
      
      From-SVN: r238035
      Richard Sandiford committed
    • [2/7] Clean up vectorizer load/store costs · 892a981f
      Add a bit more commentary and try to make the structure more obvious.
      The horrendous:
      
            if (grouped_access_p
                && represents_group_p
                && !store_lanes_p
                && !STMT_VINFO_STRIDED_P (stmt_info)
                && !slp_node)
      
      checks go away in patch 6.
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.
      
      gcc/
      	* tree-vect-stmts.c (vect_cost_group_size): Delete.
      	(vect_model_store_cost): Avoid calling it.  Use first_stmt_p
      	variable to indicate when once-per-group costs are being used.
      	(vect_model_load_cost): Likewise.  Fix comment and misindented code.
      
      From-SVN: r238034
      Richard Sandiford committed
    • [1/7] Remove unnecessary peeling for gaps check · c01e092f
      I recently relaxed the peeling-for-gaps conditions for LD3 but
      kept them as-is for load-and-permute.  I don't think the conditions
      are needed for load-and-permute either though.  No current load-and-
      permute should load outside the group, so if there is no gap at the end,
      the final vector element loaded will correspond to an element loaded
      by the original scalar loop.
      
      The patch for PR68559 (a missed optimisation PR) increased the peeled
      cases from "exact_log2 (groupsize) == -1" to "vf % group_size == 0", so
      before that fix, we didn't peel for gaps if there was no gap at the end
      of the group and if the group size was a power of 2.
      
      The only current non-power-of-2 load-and-permute size is 3, which
      doesn't require loading more than 3 vectors.
      
      The testcase is based on gcc.dg/vect/pr49038.c.
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.
      
      gcc/
      	* tree-vect-stmts.c (vectorizable_load): Remove unnecessary
      	peeling-for-gaps condition.
      
      gcc/testsuite/
      	* gcc.dg/vect/group-no-gaps-1.c: New test.
      
      From-SVN: r238033
      Richard Sandiford committed
    • S/390: Fix vecinit expansion. · a07189f4
      The fallback routine in the S/390 vecinit expander did not check
      whether each of the initializer elements is a proper general_operand.
      Since revision r236582 the expander is invoked also with e.g. symbol
      refs with an odd addend resulting in invalid insns.
      
      Fixed by forcing the element into a register in such cases.
      
      gcc/ChangeLog:
      
      2016-07-06  Andreas Krebbel  <krebbel@linux.vnet.ibm.com>
      
      	* config/s390/s390.c (s390_expand_vec_init): Force initializer
      	element to register if it doesn't match general_operand.
      
      From-SVN: r238032
      Andreas Krebbel committed
    • Fix MPX tests on systems with MPX disabled · 8070763a
      I have a Skylake system with MPX in the CPU, but MPX is disabled
      in the kernel configuration.
      
      This makes all the MPX tests fail because they assume if MPX
      is in CPUID it works
      
      Check the output of XGETBV too to detect non MPX kernels.
      
      gcc/testsuite/:
      
      2016-07-05  Andi Kleen  <ak@linux.intel.com>
      
      	* gcc.target/i386/mpx/mpx-check.h: Check XGETBV output
      	if kernel supports MPX.
      
      From-SVN: r238031
      Andi Kleen committed
    • Daily bump. · 8217ad20
      From-SVN: r238029
      GCC Administrator committed
  2. 05 Jul, 2016 18 commits
  3. 04 Jul, 2016 13 commits