1. 15 Jan, 2018 4 commits
  2. 14 Jan, 2018 20 commits
    • PR c++/81327 - cast to void* does not suppress -Wclass-memaccess · 656280b0
      gcc/ChangeLog:
      	PR c++/81327
      	* doc/invoke.texi (-Wlass-memaccess): Document suppression by casting.
      
      From-SVN: r256677
      Martin Sebor committed
    • Fix date in log. · ba791a6c
      From-SVN: r256676
      Jerry DeLisle committed
    • Fix date in Changelog · 511f5ccf
      From-SVN: r256674
      Jerry DeLisle committed
    • Correct ChangeLog of x86: Add -mfunction-return= · 616ef62f
      From-SVN: r256673
      H.J. Lu committed
    • Correct ChangeLog of x86: Add -mindirect-branch= · dfc358bf
      From-SVN: r256672
      H.J. Lu committed
    • re PR libfortran/83811 (fortran 'e' format broken for single digit exponents) · 33b2b069
      2018-01-18  Jerry DeLisle  <jvdelisle@gcc.gnu.org>
      
              PR libgfortran/83811
              * write.c (select_buffer): Adjust buffer size up by 1.
      
              * gfortran.dg/fmt_e.f90: New test.
      
      From-SVN: r256669
      Jerry DeLisle committed
    • re PR libstdc++/81092 (Missing symbols for new std::wstring constructors) · a61bac1e
      PR libstdc++/81092
      * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Update.
      
      From-SVN: r256668
      Andreas Schwab committed
    • config.gcc (i[34567]86-*-*): Remove one duplicate gfniintrin.h entry from extra_headers. · 2abaf67e
      	* config.gcc (i[34567]86-*-*): Remove one duplicate gfniintrin.h
      	entry from extra_headers.
      	(x86_64-*-*): Remove two duplicate gfniintrin.h entries from
      	extra_headers, make the list bitwise identical to the i?86-*-* one.
      
      From-SVN: r256667
      Jakub Jelinek committed
    • x86: Disallow -mindirect-branch=/-mfunction-return= with -mcmodel=large · 95d11c17
      Since the thunk function may not be reachable in large code model,
      -mcmodel=large is incompatible with -mindirect-branch=thunk,
      -mindirect-branch=thunk-extern, -mfunction-return=thunk and
      -mfunction-return=thunk-extern.  Issue an error when they are used with
      -mcmodel=large.
      
      gcc/
      
      	* config/i386/i386.c (ix86_set_indirect_branch_type): Disallow
      	-mcmodel=large with -mindirect-branch=thunk,
      	-mindirect-branch=thunk-extern, -mfunction-return=thunk and
      	-mfunction-return=thunk-extern.
      	* doc/invoke.texi: Document -mcmodel=large is incompatible with
      	-mindirect-branch=thunk, -mindirect-branch=thunk-extern,
      	-mfunction-return=thunk and -mfunction-return=thunk-extern.
      
      gcc/testsuite/
      
      	* gcc.target/i386/indirect-thunk-10.c: New test.
      	* gcc.target/i386/indirect-thunk-8.c: Likewise.
      	* gcc.target/i386/indirect-thunk-9.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-10.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-11.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-9.c: Likewise.
      	* gcc.target/i386/ret-thunk-17.c: Likewise.
      	* gcc.target/i386/ret-thunk-18.c: Likewise.
      	* gcc.target/i386/ret-thunk-19.c: Likewise.
      	* gcc.target/i386/ret-thunk-20.c: Likewise.
      	* gcc.target/i386/ret-thunk-21.c: Likewise.
      
      From-SVN: r256664
      H.J. Lu committed
    • x86: Add 'V' register operand modifier · 6abe11c1
      Add 'V', a special modifier which prints the name of the full integer
      register without '%'.  For
      
      extern void (*func_p) (void);
      
      void
      foo (void)
      {
        asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
      }
      
      it generates:
      
      foo:
      	movq	func_p(%rip), %rax
      	call	__x86_indirect_thunk_rax
      	ret
      
      gcc/
      
      	* config/i386/i386.c (print_reg): Print the name of the full
      	integer register without '%'.
      	(ix86_print_operand): Handle 'V'.
      	 * doc/extend.texi: Document 'V' modifier.
      
      gcc/testsuite/
      
      	* gcc.target/i386/indirect-thunk-register-4.c: New test.
      
      From-SVN: r256663
      H.J. Lu committed
    • x86: Add -mindirect-branch-register · d543c04b
      Add -mindirect-branch-register to force indirect branch via register.
      This is implemented by disabling patterns of indirect branch via memory,
      similar to TARGET_X32.
      
      -mindirect-branch= and -mfunction-return= tests are updated with
      -mno-indirect-branch-register to avoid false test failures when
      -mindirect-branch-register is added to RUNTESTFLAGS for "make check".
      
      gcc/
      
      	* config/i386/constraints.md (Bs): Disallow memory operand for
      	-mindirect-branch-register.
      	(Bw): Likewise.
      	* config/i386/predicates.md (indirect_branch_operand): Likewise.
      	(GOT_memory_operand): Likewise.
      	(call_insn_operand): Likewise.
      	(sibcall_insn_operand): Likewise.
      	(GOT32_symbol_operand): Likewise.
      	* config/i386/i386.md (indirect_jump): Call convert_memory_address
      	for -mindirect-branch-register.
      	(tablejump): Likewise.
      	(*sibcall_memory): Likewise.
      	(*sibcall_value_memory): Likewise.
      	Disallow peepholes of indirect call and jump via memory for
      	-mindirect-branch-register.
      	(*call_pop): Replace m with Bw.
      	(*call_value_pop): Likewise.
      	(*sibcall_pop_memory): Replace m with Bs.
      	* config/i386/i386.opt (mindirect-branch-register): New option.
      	* doc/invoke.texi: Document -mindirect-branch-register option.
      
      gcc/testsuite/
      
      	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
      	-mno-indirect-branch-register.
      	* gcc.target/i386/indirect-thunk-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
      	* gcc.target/i386/ret-thunk-10.c: Likewise.
      	* gcc.target/i386/ret-thunk-11.c: Likewise.
      	* gcc.target/i386/ret-thunk-12.c: Likewise.
      	* gcc.target/i386/ret-thunk-13.c: Likewise.
      	* gcc.target/i386/ret-thunk-14.c: Likewise.
      	* gcc.target/i386/ret-thunk-15.c: Likewise.
      	* gcc.target/i386/ret-thunk-9.c: Likewise.
      	* gcc.target/i386/indirect-thunk-register-1.c: New test.
      	* gcc.target/i386/indirect-thunk-register-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-register-3.c: Likewise.
      
      From-SVN: r256662
      H.J. Lu committed
    • x86: Add -mfunction-return= · 45e14019
      Add -mfunction-return= option to convert function return to call and
      return thunks.  The default is 'keep', which keeps function return
      unmodified.  'thunk' converts function return to call and return thunk.
      'thunk-inline' converts function return to inlined call and return thunk.
      'thunk-extern' converts function return to external call and return
      thunk provided in a separate object file.  You can control this behavior
      for a specific function by using the function attribute function_return.
      
      Function return thunk is the same as memory thunk for -mindirect-branch=
      where the return address is at the top of the stack:
      
      __x86_return_thunk:
      	call L2
      L1:
      	pause
      	lfence
      	jmp L1
      L2:
      	lea 8(%rsp), %rsp|lea 4(%esp), %esp
      	ret
      
      and function return becomes
      
      	jmp __x86_return_thunk
      
      -mindirect-branch= tests are updated with -mfunction-return=keep to
      avoid false test failures when -mfunction-return=thunk is added to
      RUNTESTFLAGS for "make check".
      
      gcc/
      
      	* config/i386/i386-protos.h (ix86_output_function_return): New.
      	* config/i386/i386.c (ix86_set_indirect_branch_type): Also
      	set function_return_type.
      	(indirect_thunk_name): Add ret_p to indicate thunk for function
      	return.
      	(output_indirect_thunk_function): Pass false to
      	indirect_thunk_name.
      	(ix86_output_indirect_branch): Likewise.
      	(output_indirect_thunk_function): Create alias for function
      	return thunk if regno < 0.
      	(ix86_output_function_return): New function.
      	(ix86_handle_fndecl_attribute): Handle function_return.
      	(ix86_attribute_table): Add function_return.
      	* config/i386/i386.h (machine_function): Add
      	function_return_type.
      	* config/i386/i386.md (simple_return_internal): Use
      	ix86_output_function_return.
      	(simple_return_internal_long): Likewise.
      	* config/i386/i386.opt (mfunction-return=): New option.
      	(indirect_branch): Mention -mfunction-return=.
      	* doc/extend.texi: Document function_return function attribute.
      	* doc/invoke.texi: Document -mfunction-return= option.
      
      gcc/testsuite/
      
      	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
      	-mfunction-return=keep.
      	* gcc.target/i386/indirect-thunk-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
      	* gcc.target/i386/ret-thunk-1.c: New test.
      	* gcc.target/i386/ret-thunk-10.c: Likewise.
      	* gcc.target/i386/ret-thunk-11.c: Likewise.
      	* gcc.target/i386/ret-thunk-12.c: Likewise.
      	* gcc.target/i386/ret-thunk-13.c: Likewise.
      	* gcc.target/i386/ret-thunk-14.c: Likewise.
      	* gcc.target/i386/ret-thunk-15.c: Likewise.
      	* gcc.target/i386/ret-thunk-16.c: Likewise.
      	* gcc.target/i386/ret-thunk-2.c: Likewise.
      	* gcc.target/i386/ret-thunk-3.c: Likewise.
      	* gcc.target/i386/ret-thunk-4.c: Likewise.
      	* gcc.target/i386/ret-thunk-5.c: Likewise.
      	* gcc.target/i386/ret-thunk-6.c: Likewise.
      	* gcc.target/i386/ret-thunk-7.c: Likewise.
      	* gcc.target/i386/ret-thunk-8.c: Likewise.
      	* gcc.target/i386/ret-thunk-9.c: Likewise.
      
      From-SVN: r256661
      H.J. Lu committed
    • x86: Add -mindirect-branch= · da99fd4a
      Add -mindirect-branch= option to convert indirect call and jump to call
      and return thunks.  The default is 'keep', which keeps indirect call and
      jump unmodified.  'thunk' converts indirect call and jump to call and
      return thunk.  'thunk-inline' converts indirect call and jump to inlined
      call and return thunk.  'thunk-extern' converts indirect call and jump to
      external call and return thunk provided in a separate object file.  You
      can control this behavior for a specific function by using the function
      attribute indirect_branch.
      
      2 kinds of thunks are geneated.  Memory thunk where the function address
      is at the top of the stack:
      
      __x86_indirect_thunk:
      	call L2
      L1:
      	pause
      	lfence
      	jmp L1
      L2:
      	lea 8(%rsp), %rsp|lea 4(%esp), %esp
      	ret
      
      Indirect jmp via memory, "jmp mem", is converted to
      
      	push memory
      	jmp __x86_indirect_thunk
      
      Indirect call via memory, "call mem", is converted to
      
      	jmp L2
      L1:
      	push [mem]
      	jmp __x86_indirect_thunk
      L2:
      	call L1
      
      Register thunk where the function address is in a register, reg:
      
      __x86_indirect_thunk_reg:
      	call	L2
      L1:
      	pause
      	lfence
      	jmp	L1
      L2:
      	movq	%reg, (%rsp)|movl    %reg, (%esp)
      	ret
      
      where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
      (r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.
      
      Indirect jmp via register, "jmp reg", is converted to
      
      	jmp __x86_indirect_thunk_reg
      
      Indirect call via register, "call reg", is converted to
      
      	call __x86_indirect_thunk_reg
      
      gcc/
      
      	* config/i386/i386-opts.h (indirect_branch): New.
      	* config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
      	* config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
      	with local indirect jump when converting indirect call and jump.
      	(ix86_set_indirect_branch_type): New.
      	(ix86_set_current_function): Call ix86_set_indirect_branch_type.
      	(indirectlabelno): New.
      	(indirect_thunk_needed): Likewise.
      	(indirect_thunk_bnd_needed): Likewise.
      	(indirect_thunks_used): Likewise.
      	(indirect_thunks_bnd_used): Likewise.
      	(INDIRECT_LABEL): Likewise.
      	(indirect_thunk_name): Likewise.
      	(output_indirect_thunk): Likewise.
      	(output_indirect_thunk_function): Likewise.
      	(ix86_output_indirect_branch): Likewise.
      	(ix86_output_indirect_jmp): Likewise.
      	(ix86_code_end): Call output_indirect_thunk_function if needed.
      	(ix86_output_call_insn): Call ix86_output_indirect_branch if
      	needed.
      	(ix86_handle_fndecl_attribute): Handle indirect_branch.
      	(ix86_attribute_table): Add indirect_branch.
      	* config/i386/i386.h (machine_function): Add indirect_branch_type
      	and has_local_indirect_jump.
      	* config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
      	to true.
      	(tablejump): Likewise.
      	(*indirect_jump): Use ix86_output_indirect_jmp.
      	(*tablejump_1): Likewise.
      	(simple_return_indirect_internal): Likewise.
      	* config/i386/i386.opt (mindirect-branch=): New option.
      	(indirect_branch): New.
      	(keep): Likewise.
      	(thunk): Likewise.
      	(thunk-inline): Likewise.
      	(thunk-extern): Likewise.
      	* doc/extend.texi: Document indirect_branch function attribute.
      	* doc/invoke.texi: Document -mindirect-branch= option.
      
      gcc/testsuite/
      
      	* gcc.target/i386/indirect-thunk-1.c: New test.
      	* gcc.target/i386/indirect-thunk-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
      	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
      
      From-SVN: r256660
      H.J. Lu committed
    • re PR ipa/83051 (ICE on valid code at -O3: in edge_badness, at ipa-inline.c:1024) · 3f05a4f0
      
      	PR ipa/83051
      	* gcc.c-torture/compile/pr83051.c: New testcase.
      	* ipa-inline.c (edge_badness): Tolerate roundoff errors.
      
      From-SVN: r256659
      Jan Hubicka committed
    • inline_small_functions speedup · 01b9bf06
      After inlining A into B, inline_small_functions updates the information
      for (most) callees and callers of the new B:
      
      	  update_callee_keys (&edge_heap, where, updated_nodes);
            [...]
            /* Our profitability metric can depend on local properties
      	 such as number of inlinable calls and size of the function body.
      	 After inlining these properties might change for the function we
      	 inlined into (since it's body size changed) and for the functions
      	 called by function we inlined (since number of it inlinable callers
      	 might change).  */
            update_caller_keys (&edge_heap, where, updated_nodes, NULL);
      
      These functions in turn call can_inline_edge_p for most of the associated
      edges:
      
      	    if (can_inline_edge_p (edge, false)
      		&& want_inline_small_function_p (edge, false))
      	      update_edge_key (heap, edge);
      
      can_inline_edge_p indirectly calls estimate_calls_size_and_time
      on the caller node, which seems to recursively process all callee
      edges rooted at the node.  It looks from this like the algorithm
      can be at least quadratic in the worst case.
      
      Maybe there's something we can do to make can_inline_edge_p cheaper, but
      since neither of these two calls is responsible for reporting an inline
      failure reason, it seems cheaper to test want_inline_small_function_p
      first, so that we don't calculate an estimate for something that we
      already know isn't a "small function".  I think the only change
      needed to make that work is to check for CIF_FINAL_ERROR in
      want_inline_small_function_p; at the moment we rely on can_inline_edge_p
      to make that check.
      
      This cuts the time to build optabs.ii by over 4% with an
      --enable-checking=release compiler on x86_64-linux-gnu.  I've seen more
      dramatic wins on aarch64-linux-gnu due to the NUM_POLY_INT_COEFFS==2
      thing.  The patch doesn't affect the output code.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      
      gcc/
      	* ipa-inline.c (want_inline_small_function_p): Return false if
      	inlining has already failed with CIF_FINAL_ERROR.
      	(update_caller_keys): Call want_inline_small_function_p before
      	can_inline_edge_p.
      	(update_callee_keys): Likewise.
      
      From-SVN: r256658
      Richard Sandiford committed
    • re PR tree-optimization/83501 (strlen(a) not folded after strcpy(a, "...")) · 61760b92
      2018-01-14  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
      
      	PR tree-optimization/83501
      	* gcc.dg/strlenopt-39.c: Restrict to i?86 and x86_64-*-* targets.
      
      From-SVN: r256657
      Prathamesh Kulkarni committed
    • rs6000-p8swap.c (rs6000_sum_of_two_registers_p): New function. · a3a821c9
      gcc/ChangeLog:
      
      2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>
      
      	* config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p):
      	New function.
      	(rs6000_quadword_masked_address_p): Likewise.
      	(quad_aligned_load_p): Likewise.
      	(quad_aligned_store_p): Likewise.
      	(const_load_sequence_p): Add comment to describe the outer-most loop.
      	(mimic_memory_attributes_and_flags): New function.
      	(rs6000_gen_stvx): Likewise.
      	(replace_swapped_aligned_store): Likewise.
      	(rs6000_gen_lvx): Likewise.
      	(replace_swapped_aligned_load): Likewise.
      	(replace_swapped_load_constant): Capitalize argument name in
      	comment describing this function.
      	(rs6000_analyze_swaps): Add a third pass to search for vector loads
      	and stores that access quad-word aligned addresses and replace
      	with stvx or lvx instructions when appropriate.
      	* config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p):
      	New function prototype.
      	(rs6000_quadword_masked_address_p): Likewise.
      	(rs6000_gen_lvx): Likewise.
      	(rs6000_gen_stvx): Likewise.
      	* config/rs6000/vsx.md (*vsx_le_perm_load_<mode>): For modes
      	VSX_D (V2DF, V2DI), modify this split to select lvx instruction
      	when memory address is aligned.
      	(*vsx_le_perm_load_<mode>): For modes VSX_W (V4SF, V4SI), modify
      	this split to select lvx instruction when memory address is aligned.
      	(*vsx_le_perm_load_v8hi): Modify this split to select lvx
      	instruction when memory address is aligned.
      	(*vsx_le_perm_load_v16qi): Likewise.
      	(four unnamed splitters): Modify to select the stvx instruction
      	when memory is aligned.
      
      gcc/testsuite/ChangeLog:
      
      2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>
      
      	* gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look
      	for lvx and stvx instead of lxvd2x and stxvd2x and require
      	little-endian target.  Add comments.
      	* gcc.target/powerpc/swaps-p8-28.c: Add functions for more
      	comprehensive testing.
      	* gcc.target/powerpc/swaps-p8-29.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-30.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-31.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-32.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-33.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-34.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-35.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-36.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-37.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-38.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-39.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-40.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-41.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-42.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-43.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-44.c: Likewise.
      	* gcc.target/powerpc/swaps-p8-45.c: Likewise.
      	* gcc.target/powerpc/vec-extract-2.c: Add comment and remove
      	scan-assembler-not directives that forbid lvx and xxpermdi.
      	* gcc.target/powerpc/vec-extract-3.c: Likewise.
      	* gcc.target/powerpc/vec-extract-5.c: Likewise.
      	* gcc.target/powerpc/vec-extract-6.c: Likewise.
      	* gcc.target/powerpc/vec-extract-7.c: Likewise.
      	* gcc.target/powerpc/vec-extract-8.c: Likewise.
      	* gcc.target/powerpc/vec-extract-9.c: Likewise.
      	* gcc.target/powerpc/vsx-vector-6-le.c: Change
      	scan-assembler-times directives to reflect different numbers of
      	expected xxlnor, xxlor, xvcmpgtdp, and xxland instructions.
      
      libcpp/ChangeLog:
      
      2018-01-10  Kelvin Nilsen  <kelvin@gcc.gnu.org>
      
      	* lex.c (search_line_fast): Remove illegal coercion of an
      	unaligned pointer value to vector pointer type and replace with
      	use of __builtin_vec_vsx_ld () built-in function, which operates
      	on unaligned pointer values.
      
      From-SVN: r256656
      Kelvin Nilsen committed
    • go/types: implement SizesFor for gccgo · ffad1c54
          
          Move the architecture-specific settings out of configure.ac into a new
          shell script goarch.sh.  Use the new script to collect the values for
          all architectures to make them available in go/types.
          
          Also fix cmd/vet to pass the right compiler when it calls SizesFor.
          
          This fixes cmd/vet for systems that are not implemented in the gc
          toolchain, such as alpha and ia64.
          
          Reviewed-on: https://go-review.googlesource.com/87635
      
      From-SVN: r256655
      Ian Lance Taylor committed
    • re PR libstdc++/83601 (std::regex_replace C++14 conformance issue: escaping in SED mode) · 8532713f
      	PR libstdc++/83601
      	* include/bits/regex.tcc (regex_replace): Fix escaping in sed.
      	* testsuite/28_regex/algorithms/regex_replace/char/pr83601.cc: Tests.
      	* testsuite/28_regex/algorithms/regex_replace/wchar_t/pr83601.cc: Tests.
      
      From-SVN: r256654
      Tim Shen committed
    • Daily bump. · 8bc5a5c5
      From-SVN: r256653
      GCC Administrator committed
  3. 13 Jan, 2018 16 commits
    • Allow for lack of VM_MEMORY_OS_ALLOC_ONCE on Mac OS X (PR sanitizer/82824) · 1f7273e5
      	PR sanitizer/82824
      	* lsan/lsan_common_mac.cc: Cherry-pick upstream r322437.
      
      From-SVN: r256650
      Rainer Orth committed
    • re PR fortran/82007 (DTIO write format stored in a string leads to severe errors) · f208c5cc
      2018-01-13  Jerry DeLisle  <jvdelisle@gcc.gnu.org>
      
              PR fortran/82007
              * resolve.c (resolve_transfer): Delete code looking for 'DT'
              format specifiers in format strings. Set formatted to true if a
              format string or format label is present.
              * trans-io.c (get_dtio_proc): Likewise. (transfer_expr): Fix
              whitespace.
      
      From-SVN: r256649
      Jerry DeLisle committed
    • predict.c (determine_unlikely_bbs): Handle correctly BBs which appears in the queue multiple times. · f36180f4
      	* predict.c (determine_unlikely_bbs): Handle correctly BBs
      	which appears in the queue multiple times.
      
      From-SVN: r256648
      Jan Hubicka committed
    • re PR fortran/83744 (ICE in ../../gcc/gcc/fortran/dump-parse-tree.c:3093 while… · 39f309ac
      re PR fortran/83744 (ICE in ../../gcc/gcc/fortran/dump-parse-tree.c:3093 while using -fc-prototypes)
      
      2018-01-13  Thomas Koenig <tkoenig@gcc.gnu.org>
      
      	PR fortran/83744
      	* dump-parse-tree.c (get_c_type_name): Remove extra line.
      	Change for loop to use declaration in for loop. Handle BT_LOGICAL
      	and BT_CHARACTER.
      	(write_decl): Add where argument. Fix indentation. Replace
      	assert with error message. Add typename to warning
      	in comment.
      	(write_type): Adjust locus to call of write_decl.
      	(write_variable): Likewise.
      	(write_proc): Likewise. Replace assert with error message.
      
      From-SVN: r256645
      Thomas Koenig committed
    • Support for aliasing with variable strides · a57776a1
      This patch adds runtime alias checks for loops with variable strides,
      so that we can vectorise them even without a restrict qualifier.
      There are several parts to doing this:
      
      1) For accesses like:
      
           x[i * n] += 1;
      
         we need to check whether n (and thus the DR_STEP) is nonzero.
         vect_analyze_data_ref_dependence records values that need to be
         checked in this way, then prune_runtime_alias_test_list records a
         bounds check on DR_STEP being outside the range [0, 0].
      
      2) For accesses like:
      
           x[i * n] = x[i * n + 1] + 1;
      
         we simply need to test whether abs (n) >= 2.
         prune_runtime_alias_test_list looks for cases like this and tries
         to guess whether it is better to use this kind of check or a check
         for non-overlapping ranges.  (We could do an OR of the two conditions
         at runtime, but that isn't implemented yet.)
      
      3) Checks for overlapping ranges need to cope with variable strides.
         At present the "length" of each segment in a range check is
         represented as an offset from the base that lies outside the
         touched range, in the same direction as DR_STEP.  The length
         can therefore be negative and is sometimes conservative.
      
         With variable steps it's easier to reaon about if we split
         this into two:
      
           seg_len:
             distance travelled from the first iteration of interest
             to the last, e.g. DR_STEP * (VF - 1)
      
           access_size:
             the number of bytes accessed in each iteration
      
         with access_size always being a positive constant and seg_len
         possibly being variable.  We can then combine alias checks
         for two accesses that are a constant number of bytes apart by
         adjusting the access size to account for the gap.  This leaves
         the segment length unchanged, which allows the check to be combined
         with further accesses.
      
         When seg_len is positive, the runtime alias check has the form:
      
              base_a >= base_b + seg_len_b + access_size_b
           || base_b >= base_a + seg_len_a + access_size_a
      
         In many accesses the base will be aligned to the access size, which
         allows us to skip the addition:
      
              base_a > base_b + seg_len_b
           || base_b > base_a + seg_len_a
      
         A similar saving is possible with "negative" lengths.
      
         The patch therefore tracks the alignment in addition to seg_len
         and access_size.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* tree-vectorizer.h (vec_lower_bound): New structure.
      	(_loop_vec_info): Add check_nonzero and lower_bounds.
      	(LOOP_VINFO_CHECK_NONZERO): New macro.
      	(LOOP_VINFO_LOWER_BOUNDS): Likewise.
      	(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
      	* tree-data-ref.h (dr_with_seg_len): Add access_size and align
      	fields.  Make seg_len the distance travelled, not including the
      	access size.
      	(dr_direction_indicator): Declare.
      	(dr_zero_step_indicator): Likewise.
      	(dr_known_forward_stride_p): Likewise.
      	* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
      	tree-ssanames.h.
      	(runtime_alias_check_p): Allow runtime alias checks with
      	variable strides.
      	(operator ==): Compare access_size and align.
      	(prune_runtime_alias_test_list): Rework for new distinction between
      	the access_size and seg_len.
      	(create_intersect_range_checks_index): Likewise.  Cope with polynomial
      	segment lengths.
      	(get_segment_min_max): New function.
      	(create_intersect_range_checks): Use it.
      	(dr_step_indicator): New function.
      	(dr_direction_indicator): Likewise.
      	(dr_zero_step_indicator): Likewise.
      	(dr_known_forward_stride_p): Likewise.
      	* tree-loop-distribution.c (data_ref_segment_size): Return
      	DR_STEP * (niters - 1).
      	(compute_alias_check_pairs): Update call to the dr_with_seg_len
      	constructor.
      	* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
      	(vect_preserves_scalar_order_p): New function, split out from...
      	(vect_analyze_data_ref_dependence): ...here.  Check for zero steps.
      	(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
      	(vect_vfa_access_size): New function.
      	(vect_vfa_align): Likewise.
      	(vect_compile_time_alias): Take access_size_a and access_b arguments.
      	(dump_lower_bound): New function.
      	(vect_check_lower_bound): Likewise.
      	(vect_small_gap_p): Likewise.
      	(vectorizable_with_step_bound_p): Likewise.
      	(vect_prune_runtime_alias_test_list): Ignore cross-iteration
      	depencies if the vectorization factor is 1.  Convert the checks
      	for nonzero steps into checks on the bounds of DR_STEP.  Try using
      	a bunds check for variable steps if the minimum required step is
      	relatively small. Update calls to the dr_with_seg_len
      	constructor and to vect_compile_time_alias.
      	* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
      	function.
      	(vect_loop_versioning): Call it.
      	* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
      	when retrying.
      	(vect_estimate_min_profitable_iters): Account for any bounds checks.
      
      gcc/testsuite/
      	* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
      	than SLP vectorization.
      	* gcc.dg/vect/vect-alias-check-10.c: New test.
      	* gcc.dg/vect/vect-alias-check-11.c: Likewise.
      	* gcc.dg/vect/vect-alias-check-12.c: Likewise.
      	* gcc.dg/vect/vect-alias-check-8.c: Likewise.
      	* gcc.dg/vect/vect-alias-check-9.c: Likewise.
      	* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
      	* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
      	* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
      	* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256644
      Richard Sandiford committed
    • Add support for SVE scatter stores · f307441a
      This is mostly a mechanical extension of the previous gather load
      support to scatter stores.  The internal functions in this case are:
      
        IFN_SCATTER_STORE (base, offsets, scale, values)
        IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask)
      
      However, one nonobvious change is to vect_analyze_data_ref_access.
      If we're treating an access as a gather load or scatter store
      (i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code
      would create a dummy data_reference whose step is 0.  There's not
      really much else it could do, since the whole point is that the
      step isn't predictable from iteration to iteration.  We then
      went into this code in vect_analyze_data_ref_access:
      
        /* Allow loads with zero step in inner-loop vectorization.  */
        if (loop_vinfo && integer_zerop (step))
          {
            GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
            if (!nested_in_vect_loop_p (loop, stmt))
      	return DR_IS_READ (dr);
      
      I.e. we'd take the step literally and assume that this is a load
      or store to an invariant address.  Loads from invariant addresses
      are supported but stores to them aren't.
      
      The code therefore had the effect of disabling all scatter stores.
      AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c
      test for the correctness of a scatter-like loop, they don't seem to
      check whether a scatter instruction is actually used.
      
      The patch therefore makes vect_analyze_data_ref_access return true
      for scatters.  We do seem to handle the aliasing correctly;
      that's tested by other functions, and is symmetrical to the
      already-working gather case.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* doc/sourcebuild.texi (vect_scatter_store): Document.
      	* optabs.def (scatter_store_optab, mask_scatter_store_optab): New
      	optabs.
      	* doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}):
      	Document.
      	* genopinit.c (main): Add supports_vec_scatter_store and
      	supports_vec_scatter_store_cached to target_optabs.
      	* gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and
      	IFN_MASK_SCATTER_STORE.
      	* internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal
      	functions.
      	* internal-fn.h (internal_store_fn_p): Declare.
      	(internal_fn_stored_value_index): Likewise.
      	* internal-fn.c (scatter_store_direct): New macro.
      	(expand_scatter_store_optab_fn): New function.
      	(direct_scatter_store_optab_supported_p): New macro.
      	(internal_store_fn_p): New function.
      	(internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and
      	IFN_MASK_SCATTER_STORE.
      	(internal_fn_mask_index): Likewise.
      	(internal_fn_stored_value_index): New function.
      	(internal_gather_scatter_fn_supported_p): Adjust operand numbers
      	for scatter stores.
      	* optabs-query.h (supports_vec_scatter_store_p): Declare.
      	* optabs-query.c (supports_vec_scatter_store_p): New function.
      	* tree-vectorizer.h (vect_get_store_rhs): Declare.
      	* tree-vect-data-refs.c (vect_analyze_data_ref_access): Return
      	true for scatter stores.
      	(vect_gather_scatter_fn_p): Handle scatter stores too.
      	(vect_check_gather_scatter): Consider using scatter stores if
      	supports_vec_scatter_store_p.
      	* tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle
      	scatter stores too.
      	* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
      	internal_fn_stored_value_index.
      	(check_load_store_masking): Handle scatter stores too.
      	(vect_get_store_rhs): Make public.
      	(vectorizable_call): Use internal_store_fn_p.
      	(vectorizable_store): Handle scatter store internal functions.
      	(vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE
      	when deciding whether the end of the group has been reached.
      	* config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec.
      	* config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander.
      	(mask_scatter_store<mode>): New insns.
      
      gcc/testsuite/
      	* lib/target-supports.exp (check_effective_target_vect_scatter_store):
      	New proc.
      	* gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on
      	targets with scatter stores.
      	* gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter
      	stores.
      	* gcc.target/aarch64/sve/mask_scatter_store_1.c: New test.
      	* gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise.
      	* gcc.target/aarch64/sve/scatter_store_1.c: Likewise.
      	* gcc.target/aarch64/sve/scatter_store_2.c: Likewise.
      	* gcc.target/aarch64/sve/scatter_store_3.c: Likewise.
      	* gcc.target/aarch64/sve/scatter_store_4.c: Likewise.
      	* gcc.target/aarch64/sve/scatter_store_5.c: Likewise.
      	* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
      	* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
      	* gcc.target/aarch64/sve/strided_store_1.c: Likewise.
      	* gcc.target/aarch64/sve/strided_store_2.c: Likewise.
      	* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
      	* gcc.target/aarch64/sve/strided_store_4.c: Likewise.
      	* gcc.target/aarch64/sve/strided_store_5.c: Likewise.
      	* gcc.target/aarch64/sve/strided_store_6.c: Likewise.
      	* gcc.target/aarch64/sve/strided_store_7.c: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256643
      Richard Sandiford committed
    • Allow gather loads to be used for grouped accesses · 429ef523
      Following on from the previous patch for strided accesses, this patch
      allows gather loads to be used with grouped accesses, if we otherwise
      would need to fall back to VMAT_ELEMENTWISE.  However, as the comment
      says, this is restricted to single-element groups for now:
      
      	 ??? Although the code can handle all group sizes correctly,
      	 it probably isn't a win to use separate strided accesses based
      	 on nearby locations.  Or, even if it's a win over scalar code,
      	 it might not be a win over vectorizing at a lower VF, if that
      	 allows us to use contiguous accesses.
      
      Single-element groups are an important special case though,
      and this means that code is less sensitive to GCC's classification
      of single accesses with constant steps as "grouped" and ones with
      variable steps as "strided".
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* tree-vectorizer.h (vect_gather_scatter_fn_p): Declare.
      	* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public.
      	* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New
      	function.
      	(vect_use_strided_gather_scatters_p): Take a masked_p argument.
      	Use vect_truncate_gather_scatter_offset if we can't treat the
      	operation as a normal gather load or scatter store.
      	(get_group_load_store_type): Take the gather_scatter_info
      	as argument.  Try using a gather load or scatter store for
      	single-element groups.
      	(get_load_store_type): Update calls to get_group_load_store_type
      	and vect_use_strided_gather_scatters_p.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used
      	for double_reduc1.
      	* gcc.target/aarch64/sve/strided_load_4.c: New test.
      	* gcc.target/aarch64/sve/strided_load_5.c: Likewise.
      	* gcc.target/aarch64/sve/strided_load_6.c: Likewise.
      	* gcc.target/aarch64/sve/strided_load_7.c: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256642
      Richard Sandiford committed
    • Use gather loads for strided accesses · ab2fc782
      This patch tries to use gather loads for strided accesses,
      rather than falling back to VMAT_ELEMENTWISE.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra
      	optional tree argument.
      	* tree-vect-data-refs.c (vect_check_gather_scatter): Check for
      	null target hooks.
      	(vect_create_data_ref_ptr): Take the iv_step as an optional argument,
      	but continue to use the current value as a fallback.
      	(bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare
      	to compare the updates.
      	* tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function.
      	(get_load_store_type): Use it when handling a strided access.
      	(vect_get_strided_load_store_ops): New function.
      	(vect_get_data_ptr_increment): Likewise.
      	(vectorizable_load): Handle strided gather loads.  Always pass
      	a step to vect_create_data_ref_ptr and bump_vector_ptr.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/strided_load_1.c: New test.
      	* gcc.target/aarch64/sve/strided_load_2.c: Likewise.
      	* gcc.target/aarch64/sve/strided_load_3.c: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256641
      Richard Sandiford committed
    • Add support for SVE gather loads · bfaa08b7
      This patch adds support for SVE gather loads.  It uses the basically
      the same analysis code as the AVX gather support, but after that
      there are two major differences:
      
      - It uses new internal functions rather than target built-ins.
        The interface is:
      
           IFN_GATHER_LOAD (base, offsets scale)
           IFN_MASK_GATHER_LOAD (base, offsets scale, mask)
      
        which should be reasonably generic.  One of the advantages of
        using internal functions is that other passes can understand what
        the functions do, but a more immediate advantage is that we can
        query the underlying target pattern to see which scales it supports.
      
      - It uses pattern recognition to convert the offset to the right width,
        if it was originally narrower than that.  This avoids having to do
        a widening operation as part of the gather expansion itself.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* doc/md.texi (gather_load@var{m}): Document.
      	(mask_gather_load@var{m}): Likewise.
      	* genopinit.c (main): Add supports_vec_gather_load and
      	supports_vec_gather_load_cached to target_optabs.
      	* optabs-tree.c (init_tree_optimization_optabs): Use
      	ggc_cleared_alloc to allocate target_optabs.
      	* optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs.
      	* internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal
      	functions.
      	* internal-fn.h (internal_load_fn_p): Declare.
      	(internal_gather_scatter_fn_p): Likewise.
      	(internal_fn_mask_index): Likewise.
      	(internal_gather_scatter_fn_supported_p): Likewise.
      	* internal-fn.c (gather_load_direct): New macro.
      	(expand_gather_load_optab_fn): New function.
      	(direct_gather_load_optab_supported_p): New macro.
      	(direct_internal_fn_optab): New function.
      	(internal_load_fn_p): Likewise.
      	(internal_gather_scatter_fn_p): Likewise.
      	(internal_fn_mask_index): Likewise.
      	(internal_gather_scatter_fn_supported_p): Likewise.
      	* optabs-query.c (supports_at_least_one_mode_p): New function.
      	(supports_vec_gather_load_p): Likewise.
      	* optabs-query.h (supports_vec_gather_load_p): Declare.
      	* tree-vectorizer.h (gather_scatter_info): Add ifn, element_type
      	and memory_type field.
      	(NUM_PATTERNS): Bump to 15.
      	* tree-vect-data-refs.c: Include internal-fn.h.
      	(vect_gather_scatter_fn_p): New function.
      	(vect_describe_gather_scatter_call): Likewise.
      	(vect_check_gather_scatter): Try using internal functions for
      	gather loads.  Recognize existing calls to a gather load function.
      	(vect_analyze_data_refs): Consider using gather loads if
      	supports_vec_gather_load_p.
      	* tree-vect-patterns.c (vect_get_load_store_mask): New function.
      	(vect_get_gather_scatter_offset_type): Likewise.
      	(vect_convert_mask_for_vectype): Likewise.
      	(vect_add_conversion_to_patterm): Likewise.
      	(vect_try_gather_scatter_pattern): Likewise.
      	(vect_recog_gather_scatter_pattern): New pattern recognizer.
      	(vect_vect_recog_func_ptrs): Add it.
      	* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
      	internal_fn_mask_index and internal_gather_scatter_fn_p.
      	(check_load_store_masking): Take the gather_scatter_info as an
      	argument and handle gather loads.
      	(vect_get_gather_scatter_ops): New function.
      	(vectorizable_call): Check internal_load_fn_p.
      	(vectorizable_load): Likewise.  Handle gather load internal
      	functions.
      	(vectorizable_store): Update call to check_load_store_masking.
      	* config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec.
      	* config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators.
      	* config/aarch64/predicates.md (aarch64_gather_scale_operand_w)
      	(aarch64_gather_scale_operand_d): New predicates.
      	* config/aarch64/aarch64-sve.md (gather_load<mode>): New expander.
      	(mask_gather_load<mode>): New insns.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/gather_load_1.c: New test.
      	* gcc.target/aarch64/sve/gather_load_2.c: Likewise.
      	* gcc.target/aarch64/sve/gather_load_3.c: Likewise.
      	* gcc.target/aarch64/sve/gather_load_4.c: Likewise.
      	* gcc.target/aarch64/sve/gather_load_5.c: Likewise.
      	* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
      	* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
      	* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256640
      Richard Sandiford committed
    • Add support for in-order addition reduction using SVE FADDA · b781a135
      This patch adds support for in-order floating-point addition reductions,
      which are suitable even in strict IEEE mode.
      
      Previously vect_is_simple_reduction would reject any cases that forbid
      reassociation.  The idea is instead to tentatively accept them as
      "FOLD_LEFT_REDUCTIONs" and only fail later if there is no support
      for them.  Although this patch only handles the particular case of plus
      and minus on floating-point types, there's no reason in principle why
      we couldn't handle other cases.
      
      The reductions use a new fold_left_plus_optab if available, otherwise
      they fall back to elementwise additions or subtractions.
      
      The vect_force_simple_reduction change makes it easier for parloops
      to read the type of reduction.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* optabs.def (fold_left_plus_optab): New optab.
      	* doc/md.texi (fold_left_plus_@var{m}): Document.
      	* internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
      	* internal-fn.c (fold_left_direct): Define.
      	(expand_fold_left_optab_fn): Likewise.
      	(direct_fold_left_optab_supported_p): Likewise.
      	* fold-const-call.c (fold_const_fold_left): New function.
      	(fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
      	* tree-parloops.c (valid_reduction_p): New function.
      	(gather_scalar_reductions): Use it.
      	* tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type.
      	(vect_finish_replace_stmt): Declare.
      	* tree-vect-loop.c (fold_left_reduction_fn): New function.
      	(needs_fold_left_reduction_p): New function, split out from...
      	(vect_is_simple_reduction): ...here.  Accept reductions that
      	forbid reassociation, but give them type FOLD_LEFT_REDUCTION.
      	(vect_force_simple_reduction): Also store the reduction type in
      	the assignment's STMT_VINFO_REDUC_TYPE.
      	(vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION.
      	(merge_with_identity): New function.
      	(vect_expand_fold_left): Likewise.
      	(vectorize_fold_left_reduction): Likewise.
      	(vectorizable_reduction): Handle FOLD_LEFT_REDUCTION.  Leave the
      	scalar phi in place for it.  Check for target support and reject
      	cases that would reassociate the operation.  Defer the transform
      	phase to vectorize_fold_left_reduction.
      	* config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
      	* config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander.
      	(*fold_left_plus_<mode>, *pred_fold_left_plus_<mode>): New insns.
      
      gcc/testsuite/
      	* gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and
      	check for a message about using in-order reductions.
      	* gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and
      	check for a message about using in-order reductions.
      	* gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
      	vectorized and check for a message about using in-order reductions.
      	Expect targets with variable-length vectors to fall back to the
      	fixed-length mininum.
      	* gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and
      	check for a message about using in-order reductions.
      	* gcc.dg/vect/vect-reduc-in-order-1.c: New test.
      	* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
      	* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
      	* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
      	* gcc.target/aarch64/sve/reduc_strict_1.c: New test.
      	* gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise.
      	* gcc.target/aarch64/sve/reduc_strict_2.c: Likewise.
      	* gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise.
      	* gcc.target/aarch64/sve/reduc_strict_3.c: Likewise.
      	* gcc.target/aarch64/sve/slp_13.c: Add floating-point types.
      	* gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if
      	vect_fold_left_plus.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256639
      Richard Sandiford committed
    • Remove unnecessary temporary in tree-if-conv.c · b89fa419
      The call to ifc_temp_var in predicate_mem_writes become redundant
      in r230099.  Before that point the mask was calculated using
      fold_build_*s, but now it's calculated by gimple_build and so
      is already a valid gimple value.
      
      As it stands, the call forces an SSA_NAME-to-SSA_NAME copy
      to be created, whereas SLP expects that such redundant copies
      have already been eliminated.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      
      gcc/
      	* tree-if-conv.c (predicate_mem_writes): Remove redundant
      	call to ifc_temp_var.
      
      From-SVN: r256638
      Richard Sandiford committed
    • Rework the legitimize_address_displacement hook · 9005477f
      This patch:
      
      - tweaks the handling of legitimize_address_displacement
        so that it gets called before rather than after the address has
        been expanded.  This means that we're no longer at the mercy
        of LRA being able to interpret the expanded instructions.
      
      - passes the original offset to legitimize_address_displacement.
      
      - adds SVE support to the AArch64 implementation of
        legitimize_address_displacement.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* target.def (legitimize_address_displacement): Take the original
      	offset as a poly_int.
      	* targhooks.h (default_legitimize_address_displacement): Update
      	accordingly.
      	* targhooks.c (default_legitimize_address_displacement): Likewise.
      	* doc/tm.texi: Regenerate.
      	* lra-constraints.c (base_plus_disp_to_reg): Take the displacement
      	as an argument, moving assert of ad->disp == ad->disp_term to...
      	(process_address_1): ...here.  Update calls to base_plus_disp_to_reg.
      	Try calling targetm.legitimize_address_displacement before expanding
      	the address rather than afterwards, and adjust for the new interface.
      	* config/aarch64/aarch64.c (aarch64_legitimize_address_displacement):
      	Match the new hook interface.  Handle SVE addresses.
      	* config/sh/sh.c (sh_legitimize_address_displacement): Make the
      	new hook interface.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256637
      Richard Sandiford committed
    • Add an "early rematerialisation" pass · 5cce8171
      This patch looks for pseudo registers that are live across a call
      and for which no call-preserved hard registers exist.  It then
      recomputes the pseudos as necessary to ensure that they are no
      longer live across a call.  The comment at the head of the file
      describes the approach.
      
      A new target hook selects which modes should be treated in this way.
      By default none are, in which case the pass is skipped very early.
      
      It might also be worth looking for cases like:
      
         C1: R1 := f (...)
         ...
         C2: R2 := f (...)
         C3: R1 := C2
      
      and giving the same value number to C1 and C3, effectively treating
      it like:
      
         C1: R1 := f (...)
         ...
         C2: R2 := f (...)
         C3: R1 := f (...)
      
      Another (much more expensive) enhancement would be to apply value
      numbering to all pseudo registers (not just rematerialisation
      candidates), so that we can handle things like:
      
        C1: R1 := f (...R2...)
        ...
        C2: R1 := f (...R3...)
      
      where R2 and R3 hold the same value.  But the current pass seems
      to catch the vast majority of cases.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      
      gcc/
      	* Makefile.in (OBJS): Add early-remat.o.
      	* target.def (select_early_remat_modes): New hook.
      	* doc/tm.texi.in (TARGET_SELECT_EARLY_REMAT_MODES): New hook.
      	* doc/tm.texi: Regenerate.
      	* targhooks.h (default_select_early_remat_modes): Declare.
      	* targhooks.c (default_select_early_remat_modes): New function.
      	* timevar.def (TV_EARLY_REMAT): New timevar.
      	* passes.def (pass_early_remat): New pass.
      	* tree-pass.h (make_pass_early_remat): Declare.
      	* early-remat.c: New file.
      	* config/aarch64/aarch64.c (aarch64_select_early_remat_modes): New
      	function.
      	(TARGET_SELECT_EARLY_REMAT_MODES): Define.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/spill_1.c: Also test that no predicates
      	are spilled.
      	* gcc.target/aarch64/sve/spill_2.c: New test.
      	* gcc.target/aarch64/sve/spill_3.c: Likewise.
      	* gcc.target/aarch64/sve/spill_4.c: Likewise.
      	* gcc.target/aarch64/sve/spill_5.c: Likewise.
      	* gcc.target/aarch64/sve/spill_6.c: Likewise.
      	* gcc.target/aarch64/sve/spill_7.c: Likewise.
      
      From-SVN: r256636
      Richard Sandiford committed
    • Use single-iteration epilogues when peeling for gaps · d1d20a49
      This patch adds support for fully-masking loops that require peeling
      for gaps.  It peels exactly one scalar iteration and uses the masked
      loop to handle the rest.  Previously we would fall back on using a
      standard unmasked loop instead.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace
      	vfm1 with a bound_epilog parameter.
      	(vect_do_peeling): Update calls accordingly, and move the prologue
      	call earlier in the function.  Treat the base bound_epilog as 0 for
      	fully-masked loops and retain vf - 1 for other loops.  Add 1 to
      	this base when peeling for gaps.
      	* tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps
      	with fully-masked loops.
      	(vect_estimate_min_profitable_iters): Handle the single peeled
      	iteration in that case.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/struct_vect_18.c: Check the number
      	of branches.
      	* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_20.c: New test.
      	* gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_21.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_22.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_23.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256635
      Richard Sandiford committed
    • Allow single-element interleaving for non-power-of-2 strides · 4aa157e8
      This allows LD3 to be used for isolated a[i * 3] accesses, in a similar
      way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively.
      Given the problems with the cost model underestimating the cost of
      elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE
      cases that are currently rejected.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* tree-vect-data-refs.c (vect_analyze_group_access_1): Allow
      	single-element interleaving even if the size is not a power of 2.
      	* tree-vect-stmts.c (get_load_store_type): Disallow elementwise
      	accesses for single-element interleaving if the group size is
      	not a power of 2.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/struct_vect_18.c: New test.
      	* gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
      	* gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256634
      Richard Sandiford committed
    • Add support for conditional reductions using SVE CLASTB · bb6c2b68
      This patch uses SVE CLASTB to optimise conditional reductions.  It means
      that we no longer need to maintain a separate index vector to record
      the most recent valid value, and no longer need to worry about overflow
      cases.
      
      2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* doc/md.texi (fold_extract_last_@var{m}): Document.
      	* doc/sourcebuild.texi (vect_fold_extract_last): Likewise.
      	* optabs.def (fold_extract_last_optab): New optab.
      	* internal-fn.def (FOLD_EXTRACT_LAST): New internal function.
      	* internal-fn.c (fold_extract_direct): New macro.
      	(expand_fold_extract_optab_fn): Likewise.
      	(direct_fold_extract_optab_supported_p): Likewise.
      	* tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type.
      	* tree-vect-loop.c (vect_model_reduction_cost): Handle
      	EXTRACT_LAST_REDUCTION.
      	(get_initial_def_for_reduction): Do not create an initial vector
      	for EXTRACT_LAST_REDUCTION reductions.
      	(vectorizable_reduction): Leave the scalar phi in place for
      	EXTRACT_LAST_REDUCTIONs.  Try using EXTRACT_LAST_REDUCTION
      	ahead of INTEGER_INDUC_COND_REDUCTION.  Do not check for an
      	epilogue code for EXTRACT_LAST_REDUCTION and defer the
      	transform phase to vectorizable_condition.
      	* tree-vect-stmts.c (vect_finish_stmt_generation_1): New function,
      	split out from...
      	(vect_finish_stmt_generation): ...here.
      	(vect_finish_replace_stmt): New function.
      	(vectorizable_condition): Handle EXTRACT_LAST_REDUCTION.
      	* config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New
      	pattern.
      	* config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec.
      
      gcc/testsuite/
      	* lib/target-supports.exp
      	(check_effective_target_vect_fold_extract_last): New proc.
      	* gcc.dg/vect/pr65947-1.c: Update dump messages.  Add markup
      	for fold_extract_last.
      	* gcc.dg/vect/pr65947-2.c: Likewise.
      	* gcc.dg/vect/pr65947-3.c: Likewise.
      	* gcc.dg/vect/pr65947-4.c: Likewise.
      	* gcc.dg/vect/pr65947-5.c: Likewise.
      	* gcc.dg/vect/pr65947-6.c: Likewise.
      	* gcc.dg/vect/pr65947-9.c: Likewise.
      	* gcc.dg/vect/pr65947-10.c: Likewise.
      	* gcc.dg/vect/pr65947-12.c: Likewise.
      	* gcc.dg/vect/pr65947-14.c: Likewise.
      	* gcc.dg/vect/pr80631-1.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_1.c: New test.
      	* gcc.target/aarch64/sve/clastb_1_run.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_2.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_2_run.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_3.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_3_run.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_4.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_4_run.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_5.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_5_run.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_6.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_6_run.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_7.c: Likewise.
      	* gcc.target/aarch64/sve/clastb_7_run.c: Likewise.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256633
      Richard Sandiford committed