1. 12 Jan, 2018 16 commits
    • re PR target/81819 ([RX] internal compiler error: in… · ec952125
      re PR target/81819 ([RX] internal compiler error: in rx_is_restricted_memory_address, at config/rx/rx.c:311)
      
      gcc/
      	PR target/81819
      	* config/rx/rx.c (rx_is_restricted_memory_address):
      	Handle SUBREG case.
      
      From-SVN: r256578
      Oleg Endo committed
    • rs6000: Tune new testcase (PR83629) · eda03189
      It has some problems running on some 64-bit configuratiions, and the
      bug it is testing for is only on 32-bit; so let's not run it elsewhere.
      
      
      gcc/testsuite/
      	PR target/83629
      	* gcc.target/powerpc/pr83629.c: Require ilp32.
      
      From-SVN: r256577
      Segher Boessenkool committed
    • re PR target/80846 (auto-vectorized AVX2 horizontal sum should narrow to 128b… · c803b2a9
      re PR target/80846 (auto-vectorized AVX2 horizontal sum should narrow to 128b right away, to be more efficient for Ryzen and Intel)
      
      2018-01-12  Richard Biener  <rguenther@suse.de>
      
      	PR tree-optimization/80846
      	* target.def (split_reduction): New target hook.
      	* targhooks.c (default_split_reduction): New function.
      	* targhooks.h (default_split_reduction): Declare.
      	* tree-vect-loop.c (vect_create_epilog_for_reduction): If the
      	target requests first reduce vectors by combining low and high
      	parts.
      	* tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust.
      	(get_vectype_for_scalar_type_and_size): Export.
      	* tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare.
      
      	* doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document.
      	* doc/tm.texi: Regenerate.
      
      	i386/
      	* config/i386/i386.c (ix86_split_reduction): Implement
      	TARGET_VECTORIZE_SPLIT_REDUCTION.
      
      	* gcc.target/i386/pr80846-1.c: New testcase.
      	* gcc.target/i386/pr80846-2.c: Likewise.
      
      From-SVN: r256576
      Richard Biener committed
    • re PR target/83368 (alloca after setjmp breaks PIC base reg) · 46336a0e
      	PR target/83368
      	* config/sparc/sparc.h (PIC_OFFSET_TABLE_REGNUM): Set to INVALID_REGNUM
      	in PIC mode except for TARGET_VXWORKS_RTP.
      	* config/sparc/sparc.c: Include cfgrtl.h.
      	(TARGET_INIT_PIC_REG): Define.
      	(TARGET_USE_PSEUDO_PIC_REG): Likewise.
      	(sparc_pic_register_p): New predicate.
      	(sparc_legitimate_address_p): Use it.
      	(sparc_legitimize_pic_address): Likewise.
      	(sparc_delegitimize_address): Likewise.
      	(sparc_mode_dependent_address_p): Likewise.
      	(gen_load_pcrel_sym): Remove 4th parameter.
      	(load_got_register): Adjust call to above.  Remove obsolete stuff.
      	(sparc_expand_prologue): Do not call load_got_register here.
      	(sparc_flat_expand_prologue): Likewise.
      	(sparc_output_mi_thunk): Set the pic_offset_table_rtx object.
      	(sparc_use_pseudo_pic_reg): New function.
      	(sparc_init_pic_reg): Likewise.
      	* config/sparc/sparc.md (vxworks_load_got): Set the GOT register.
      	(builtin_setjmp_receiver): Enable only for TARGET_VXWORKS_RTP.
      
      From-SVN: r256575
      Eric Botcazou committed
    • Add doc for branch_cost effective target. · 7dbf8707
      2018-01-12  Christophe Lyon  <christophe.lyon@linaro.org>
      
      	gcc/
      	* doc/sourcebuild.texi (Effective-Target Keywords, Other attributes):
      	Add item for branch_cost.
      
      From-SVN: r256574
      Christophe Lyon committed
    • re PR rtl-optimization/83565 (RTL combine pass yields wrong rotate result) · 371ae937
      	PR rtl-optimization/83565
      	* rtlanal.c (nonzero_bits1): On WORD_REGISTER_OPERATIONS machines, do
      	not extend the result to a larger mode for rotate operations.
      	(num_sign_bit_copies1): Likewise.
      
      From-SVN: r256572
      Eric Botcazou committed
    • Add dg-require-effective-target indirect_jumps for g++ · c574147e
      2018-01-12  Tom de Vries  <tom@codesourcery.com>
      
      	* g++.dg/ext/label13.C: Add dg-require-effective-target indirect_jumps.
      	* g++.dg/ext/label13a.C: Same.
      	* g++.dg/ext/label14.C: Same.
      	* g++.dg/ext/label2.C: Same.
      	* g++.dg/ext/label3.C: Same.
      	* g++.dg/torture/pr42462.C: Same.
      	* g++.dg/torture/pr42739.C: Same.
      	* g++.dg/warn/Wunused-label-3.C: Same.
      
      From-SVN: r256571
      Tom de Vries committed
    • Add dg-require-effective-target alloca for c++ test-cases · 41287945
      2018-01-12  Tom de Vries  <tom@codesourcery.com>
      
      	* c-c++-common/dwarf2/vla1.c: Add dg-require-effective-target alloca.
      	* g++.dg/Walloca1.C: Same.
      	* g++.dg/cpp0x/pr70338.C: Same.
      	* g++.dg/cpp1y/lambda-generic-vla1.C: Same.
      	* g++.dg/cpp1y/vla10.C: Same.
      	* g++.dg/cpp1y/vla2.C: Same.
      	* g++.dg/cpp1y/vla6.C: Same.
      	* g++.dg/cpp1y/vla8.C: Same.
      	* g++.dg/debug/debug5.C: Same.
      	* g++.dg/debug/debug6.C: Same.
      	* g++.dg/debug/pr54828.C: Same.
      	* g++.dg/diagnostic/pr70105.C: Same.
      	* g++.dg/eh/cleanup5.C: Same.
      	* g++.dg/eh/spbp.C: Same.
      	* g++.dg/ext/tmplattr9.C: Same.
      	* g++.dg/ext/vla10.C: Same.
      	* g++.dg/ext/vla11.C: Same.
      	* g++.dg/ext/vla12.C: Same.
      	* g++.dg/ext/vla15.C: Same.
      	* g++.dg/ext/vla16.C: Same.
      	* g++.dg/ext/vla17.C: Same.
      	* g++.dg/ext/vla3.C: Same.
      	* g++.dg/ext/vla6.C: Same.
      	* g++.dg/ext/vla7.C: Same.
      	* g++.dg/init/array24.C: Same.
      	* g++.dg/init/new47.C: Same.
      	* g++.dg/init/pr55497.C: Same.
      	* g++.dg/opt/pr78201.C: Same.
      	* g++.dg/template/vla2.C: Same.
      	* g++.dg/torture/Wsizeof-pointer-memaccess1.C: Same.
      	* g++.dg/torture/Wsizeof-pointer-memaccess2.C: Same.
      	* g++.dg/torture/pr62127.C: Same.
      	* g++.dg/torture/pr67055.C: Same.
      	* g++.dg/torture/stackalign/eh-alloca-1.C: Same.
      	* g++.dg/torture/stackalign/eh-inline-2.C: Same.
      	* g++.dg/torture/stackalign/eh-vararg-1.C: Same.
      	* g++.dg/torture/stackalign/eh-vararg-2.C: Same.
      	* g++.dg/warn/Wplacement-new-size-5.C: Same.
      	* g++.dg/warn/Wsizeof-pointer-memaccess-1.C: Same.
      	* g++.dg/warn/Wvla-1.C: Same.
      	* g++.dg/warn/Wvla-3.C: Same.
      	* g++.old-deja/g++.ext/array2.C: Same.
      	* g++.old-deja/g++.ext/constructor.C: Same.
      	* g++.old-deja/g++.law/builtin1.C: Same.
      	* g++.old-deja/g++.other/crash12.C: Same.
      	* g++.old-deja/g++.other/eh3.C: Same.
      	* g++.old-deja/g++.pt/array6.C: Same.
      	* g++.old-deja/g++.pt/dynarray.C: Same.
      
      From-SVN: r256570
      Tom de Vries committed
    • Fix g++.dg/cpp0x/inh-ctor30.C · 01da712b
      	* g++.dg/cpp0x/inh-ctor30.C: Allow for alternate mangled form.
      
      From-SVN: r256569
      Rainer Orth committed
    • Link with correct values-*.o files on Solaris (PR target/40411) · c969e34e
      	gcc/testsuite:
      	PR libfortran/67412
      	* gfortran.dg/execute_command_line_2.f90: Remove dg-xfail-run-if
      	on *-*-solaris2.10.
      
      	libstdc++-v3:
      	PR libstdc++/64054
      	* testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
      	Remove dg-xfail-run-if.
      
      	gcc:
      	PR target/40411
      	* config/sol2.h (STARTFILE_ARCH_SPEC): Don't use with -shared or
      	-symbolic.
      	Use values-Xc.o for -pedantic.
      	Link with values-xpg4.o for C90, values-xpg6.o otherwise.
      
      From-SVN: r256568
      Rainer Orth committed
    • Include all x86 targets in branch_cost effective target · a7448bdf
      	* lib/target-supports.exp (check_effective_target_branch_cost):
      	Accept all x86 targets.
      
      From-SVN: r256567
      Rainer Orth committed
    • Initialize type_warnings::dyn_count with a default value (PR ipa/83054). · 53b73588
      2018-01-12  Martin Liska  <mliska@suse.cz>
      
      	PR ipa/83054
      	* ipa-devirt.c (final_warning_record::grow_type_warnings):
      	New function.
      	(possible_polymorphic_call_targets): Use it.
      	(ipa_devirt): Likewise.
      2018-01-12  Martin Liska  <mliska@suse.cz>
      
      	PR ipa/83054
      	* g++.dg/warn/pr83054.C: New test.
      
      From-SVN: r256566
      Martin Liska committed
    • Add new verification for profile-count.h. · aae9da9b
      2018-01-12  Martin Liska  <mliska@suse.cz>
      
      	* profile-count.h (enum profile_quality): Use 0 as invalid
      	enum value of profile_quality.
      
      From-SVN: r256565
      Martin Liska committed
    • Add new NDS32 options -mext-perf, -mext-perf2 and -mext-string in the documentation. · b710b08a
      gcc/
      	* doc/invoke.texi (NDS32 Options): Add -mext-perf, -mext-perf2 and
      	-mext-string options.
      
      From-SVN: r256564
      Chung-Ju Wu committed
    • lto-streamer-out.c (DFS::DFS_write_tree_body): Process DECL_DEBUG_EXPR… · c1a7ca7c
      lto-streamer-out.c (DFS::DFS_write_tree_body): Process DECL_DEBUG_EXPR conditional on DECL_HAS_DEBUG_EXPR_P.
      
      2018-01-12  Richard Biener  <rguenther@suse.de>
      
      	* lto-streamer-out.c (DFS::DFS_write_tree_body): Process
      	DECL_DEBUG_EXPR conditional on DECL_HAS_DEBUG_EXPR_P.
      	* tree-streamer-in.c (lto_input_ts_decl_common_tree_pointers):
      	Likewise.
      	* tree-streamer-out.c (write_ts_decl_common_tree_pointers): Likewise.
      
      From-SVN: r256563
      Richard Biener committed
    • Daily bump. · 7b2ce347
      From-SVN: r256561
      GCC Administrator committed
  2. 11 Jan, 2018 24 commits
    • configure.ac (--with-long-double-format): Add support for the configuration… · 8c7a27d5
      configure.ac (--with-long-double-format): Add support for the configuration option to change the default long double...
      
      2018-01-11  Michael Meissner  <meissner@linux.vnet.ibm.com>
      
      	* configure.ac (--with-long-double-format): Add support for the
      	configuration option to change the default long double format on
      	PowerPC systems.
      	* config.gcc (powerpc*-linux*-*): Likewise.
      	* configure: Regenerate.
      	* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): If long
      	double is IEEE, define __KC__ and __KF__ to allow floatn.h to be
      	used without modification.
      
      From-SVN: r256558
      Michael Meissner committed
    • rs6000-builtin.def (BU_P7_MISC_X): New #define. · 02a03501
      [gcc]
      
      2018-01-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
      
      	* config/rs6000/rs6000-builtin.def (BU_P7_MISC_X): New #define.
      	(SPEC_BARRIER): New instantiation of BU_P7_MISC_X.
      	* config/rs6000/rs6000.c (rs6000_expand_builtin): Handle
      	MISC_BUILTIN_SPEC_BARRIER.
      	(rs6000_init_builtins): Likewise.
      	* config/rs6000/rs6000.md (UNSPECV_SPEC_BARRIER): New UNSPECV
      	enum value.
      	(speculation_barrier): New define_insn.
      	* doc/extend.texi: Document __builtin_speculation_barrier.
      
      [gcc/testsuite]
      
      2018-01-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
      
      	* gcc.target/powerpc/spec-barr-1.c: New file.
      
      From-SVN: r256557
      Bill Schmidt committed
    • re PR target/83203 (Inefficient int to avx2 vector conversion) · 1ad6e904
      	PR target/83203
      	* config/i386/i386.c (ix86_expand_vector_init_one_nonzero): If one_var
      	is 0, for V{8,16}S[IF] and V[48]D[IF]mode use gen_vec_set<mode>_0.
      	* config/i386/sse.md (VI8_AVX_AVX512F, VI4F_256_512): New mode
      	iterators.
      	(ssescalarmodesuffix): Add 512-bit vectors.  Use "d" or "q" for
      	integral modes instead of "ss" and "sd".
      	(vec_set<mode>_0): New define_insns for 256-bit and 512-bit
      	vectors with 32-bit and 64-bit elements.
      	(vecdupssescalarmodesuffix): New mode attribute.
      	(vec_dup<mode>): Use it.
      
      From-SVN: r256556
      Jakub Jelinek committed
    • i386: Align stack frame if argument is passed on stack · c7a61831
      When a function call is removed, it may become a leaf function.  But if
      argument may be passed on stack, we need to align the stack frame when
      there is no tail call.
      
      Tested on Linux/i686 and Linux/x86-64.
      
      gcc/
      
      	PR target/83330
      	* config/i386/i386.c (ix86_compute_frame_layout): Align stack
      	frame if argument is passed on stack.
      
      gcc/testsuite/
      
      	PR target/83330
      	* gcc.target/i386/pr83330.c: New test.
      
      From-SVN: r256555
      H.J. Lu committed
    • re PR fortran/79383 (USE statement error) · 278e902c
      2018-01-11  Steven G. Kargl <kargl@gcc.gnu.org>
      
      	PR fortran/79383
      	* gfortran.dg/dtio_31.f03: New test.
      	* gfortran.dg/dtio_32.f03: New test.
      
      From-SVN: r256554
      Steven G. Kargl committed
    • re PR go/83794 (misc/cgo/test uses gigabytes of memory) · fbea3c33
      	PR go/83794
          misc/cgo/test: avoid endless loop when we can't parse notes
          
          Reviewed-on: https://go-review.googlesource.com/87416
      
      From-SVN: r256553
      Ian Lance Taylor committed
    • Add some reproducers for issues found developing the location-wrappers patch · c5269263
      gcc/testsuite/ChangeLog:
      	PR c++/43486
      	* g++.dg/wrappers: New subdirectory.
      	* g++.dg/wrappers/README: New file.
      	* g++.dg/wrappers/alloc.C: New test case.
      	* g++.dg/wrappers/cow-istream-string.C: New test case.
      	* g++.dg/wrappers/cp-stdlib.C: New test case.
      	* g++.dg/wrappers/sanitizer_coverage_libcdep_new.C: New test case.
      	* g++.dg/wrappers/wrapper-around-type-pack-expansion.C: New test
      	case.
      
      From-SVN: r256552
      David Malcolm committed
    • re PR target/82682 (FAIL: gcc.target/i386/pr50038.c scan-assembler-times movzbl… · e2c0d088
      re PR target/82682 (FAIL: gcc.target/i386/pr50038.c scan-assembler-times movzbl 2 (found 3 times) since r253958)
      
      	PR target/82682
      	* ree.c (combine_reaching_defs): Optimize also
      	reg2=exp; reg1=reg2; reg2=any_extend(reg1); into
      	reg2=any_extend(exp); reg1=reg2;, formatting fix.
      
      From-SVN: r256551
      Jakub Jelinek committed
    • PR c++/82728 - wrong -Wunused-but-set-variable · 03943bbd
      	PR c++/82799
      	PR c++/83690
      	* call.c (perform_implicit_conversion_flags): Call mark_rvalue_use.
      	* decl.c (case_conversion): Likewise.
      	* semantics.c (finish_static_assert): Call
      	perform_implicit_conversion_flags.
      
      From-SVN: r256550
      Jason Merrill committed
    • re PR tree-optimization/83189 (internal compiler error: in probability_in, at profile-count.h:1050) · c2893c6e
      	PR middle-end/83189
      	* gimple-ssa-isolate-paths.c (isolate_path): Fix profile update.
      
      From-SVN: r256545
      Jan Hubicka committed
    • re PR middle-end/83718 (ICE: Floating point exception in profile_count::apply_scale) · 0526ed2a
      	PR middle-end/83718
      	* tree-inline.c (copy_cfg_body): Adjust num&den for scaling
      	after they are computed.
      	* g++.dg/torture/pr83718.C: New testcase.
      
      From-SVN: r256544
      Jan Hubicka committed
    • [C++ PATCH] kill unused enum · 2a3af45c
      https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00923.html
      	* method.c (enum mangling_flags): Delete long-dead enum.
      
      From-SVN: r256543
      Nathan Sidwell committed
    • re PR ipa/83178 (g++.dg/ipa/devirt-22.C fail) · 346ac3a8
      	PR ipa/83178
      	* g++.dg/ipa/devirt-22.C: Adjust scan-dump-times count.
      
      From-SVN: r256542
      Martin Jambor committed
    • re PR tree-optimization/83695 (ICE on valid code at -O3: Segmentation fault) · 4e090bcc
      	PR tree-optimization/83695
      	* gimple-loop-linterchange.cc
      	(tree_loop_interchange::interchange_loops): Call scev_reset_htab to
      	reset cached scev information after interchange.
      	(pass_linterchange::execute): Remove call to scev_reset_htab.
      
      	gcc/testsuite
      	PR tree-optimization/83695
      	* gcc.dg/tree-ssa/pr83695.c: New test.
      
      From-SVN: r256541
      Bin Cheng committed
    • [arm][3/3] Implement fp16fml lane intrinsics · eccf4d70
      This patch implements the lane-wise fp16fml intrinsics.
      There's quite a few of them so I've split them up from
      the other simpler fp16fml intrinsics.
      
      These ones expose instructions such as
      
      vfmal.f16 Dd, Sn, Sm[<index>]  0 <= index <= 1
      vfmal.f16 Qd, Dn, Dm[<index>]  0 <= index <= 3
      vfmsl.f16 Dd, Sn, Sm[<index>]  0 <= index <= 1
      vfmsl.f16 Qd, Dn, Dm[<index>]  0 <= index <= 3
      
      These instructions extract a single half-precision
      floating-point value from one of the source regs
      and perform a vfmal/vfmsl operation as per the
      normal variant with that value.
      
      The nuance here is that some of the intrinsics want
      to do things like:
      
      float32x2_t vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b, const int __index)
      
      
      where the float16x8_t value of '__b' is held in a Q
      register, so we need to be a bit smart about finding
      the right D or S sub-register and translating the
      lane number to a lane in that sub-register, instead
      of just passing the language-level const-int down to
      the assembly instruction.
      
      That's where most of the complexity of this patch comes from
      but hopefully it's orthogonal enough to make sense.
      
      Bootstrapped and tested on arm-none-linux-gnueabihf as well as
      armeb-none-eabi.
      
      	* config/arm/arm_neon.h (vfmlal_lane_low_u32, vfmlal_lane_high_u32,
      	vfmlalq_laneq_low_u32, vfmlalq_lane_low_u32, vfmlal_laneq_low_u32,
      	vfmlalq_laneq_high_u32, vfmlalq_lane_high_u32, vfmlal_laneq_high_u32,
      	vfmlsl_lane_low_u32, vfmlsl_lane_high_u32, vfmlslq_laneq_low_u32,
      	vfmlslq_lane_low_u32, vfmlsl_laneq_low_u32, vfmlslq_laneq_high_u32,
      	vfmlslq_lane_high_u32, vfmlsl_laneq_high_u32): Define.
      	* config/arm/arm_neon_builtins.def (vfmal_lane_low,
      	vfmal_lane_lowv4hf, vfmal_lane_lowv8hf, vfmal_lane_high,
      	vfmal_lane_highv4hf, vfmal_lane_highv8hf, vfmsl_lane_low,
      	vfmsl_lane_lowv4hf, vfmsl_lane_lowv8hf, vfmsl_lane_high,
      	vfmsl_lane_highv4hf, vfmsl_lane_highv8hf): New sets of builtins.
      	* config/arm/iterators.md (VFMLSEL2, vfmlsel2): New mode attributes.
      	(V_lane_reg): Likewise.
      	* config/arm/neon.md (neon_vfm<vfml_op>l_lane_<vfml_half><VCVTF:mode>):
      	New define_expand.
      	(neon_vfm<vfml_op>l_lane_<vfml_half><vfmlsel2><mode>): Likewise.
      	(vfmal_lane_low<mode>_intrinsic,
      	vfmal_lane_low<vfmlsel2><mode>_intrinsic,
      	vfmal_lane_high<vfmlsel2><mode>_intrinsic,
      	vfmal_lane_high<mode>_intrinsic, vfmsl_lane_low<mode>_intrinsic,
      	vfmsl_lane_low<vfmlsel2><mode>_intrinsic,
      	vfmsl_lane_high<vfmlsel2><mode>_intrinsic,
      	vfmsl_lane_high<mode>_intrinsic): New define_insns.
      
      	* gcc.target/arm/simd/fp16fml_lane_high.c: New test.
      	* gcc.target/arm/simd/fp16fml_lane_low.c: New test.
      
      From-SVN: r256540
      Kyrylo Tkachov committed
    • [arm][2/3] Implement fp16fml extension for ARMv8.4-A · 06e95715
      This patch adds the +fp16fml extension that enables some
      half-precision floating-point Advanced SIMD instructions,
      available through arm_neon.h intrinsics.
      
      This extension is on by default for armv8.4-a
      if fp16 is available, so it can be enabled by -march=armv8.4-a+fp16.
      
      fp16fml is also available for armv8.2-a and armv8.3-a through the
      +fp16fml option that is added for these architectures.
      
      The new instructions that this patch adds support for are:
      vfmal.f16 Dr, Sm, Sn
      vfmal.f16 Qr, Dm, Dn
      vfmsl.f16 Dr, Sm, Sn
      vfmsl.f16 Qr, Dm, Dn
      
      They interpret their input registers as a vector of half-precision
      floating-point values, extend them to single-precision vectors
      and perform a fused multiply-add or subtract of them with the
      destination vector.
      
      This patch exposes these instructions through arm_neon.h intrinsics.
      The set of intrinsics allows us to do stuff such as perform
      the multiply-add/subtract operation on the low or top half of
      float16x4_t and float16x8_t values.  This maps naturally in aarch64
      to the FMLAL and FMLAL2 instructions but on arm we have to use the
      fact that consecutive NEON registers overlap the wider register
      (i.e. d0 is s0 plus s1, q0 is d0 plus d1 etc). This just means
      we have to be careful to use the right subreg operand print code.
      
      New arm-specific builtins are defined to expand to the new patterns.
      I've managed to compress the define_expands using code, mode and int
      iterators but the define_insns don't compress very well without two-tiered
      iterators (iterator attributes expanding to iterators) which we
      don't support.
      
      Bootstrapped and tested on arm-none-linux-gnueabihf and also on
      armeb-none-eabi.
      
      	* config/arm/arm-cpus.in (fp16fml): New feature.
      	(ALL_SIMD): Add fp16fml.
      	(armv8.2-a): Add fp16fml as an option.
      	(armv8.3-a): Likewise.
      	(armv8.4-a): Add fp16fml as part of fp16.
      	* config/arm/arm.h (TARGET_FP16FML): Define.
      	* config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_FP16_FML
      	when appropriate.
      	* config/arm/arm-modes.def (V2HF): Define.
      	* config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
      	vfmlal_high_u32, vfmlsl_high_u32, vfmlalq_low_u32,
      	vfmlslq_low_u32, vfmlalq_high_u32, vfmlslq_high_u32): Define.
      	* config/arm/arm_neon_builtins.def (vfmal_low, vfmal_high,
      	vfmsl_low, vfmsl_high): New set of builtins.
      	* config/arm/iterators.md (PLUSMINUS): New code iterator.
      	(vfml_op): New code attribute.
      	(VFMLHALVES): New int iterator.
      	(VFML, VFMLSEL): New mode attributes.
      	(V_reg): Define mapping for V2HF.
      	(V_hi, V_lo): New mode attributes.
      	(VF_constraint): Likewise.
      	(vfml_half, vfml_half_selector): New int attributes.
      	* config/arm/neon.md (neon_vfm<vfml_op>l_<vfml_half><mode>): New
      	define_expand.
      	(vfmal_low<mode>_intrinsic, vfmsl_high<mode>_intrinsic,
      	vfmal_high<mode>_intrinsic, vfmsl_low<mode>_intrinsic):
      	New define_insn.
      	* config/arm/t-arm-elf (v8_fps): Add fp16fml.
      	* config/arm/t-multilib (v8_2_a_simd_variants): Add fp16fml.
      	* config/arm/unspecs.md (UNSPEC_VFML_LO, UNSPEC_VFML_HI): New unspecs.
      	* doc/invoke.texi (ARM Options): Document fp16fml.  Update armv8.4-a
      	documentation.
      	* doc/sourcebuild.texi (arm_fp16fml_neon_ok, arm_fp16fml_neon):
      	Document new effective target and option set.
      
      	* gcc.target/arm/multilib.exp: Add combination tests for fp16fml.
      	* gcc.target/arm/simd/fp16fml_high.c: New test.
      	* gcc.target/arm/simd/fp16fml_low.c: Likewise.
      	* lib/target-supports.exp
      	(check_effective_target_arm_fp16fml_neon_ok_nocache,
      	check_effective_target_arm_fp16fml_neon_ok,
      	add_options_for_arm_fp16fml_neon): New procedures.
      
      From-SVN: r256539
      Kyrylo Tkachov committed
    • [arm][1/3] Add -march=armv8.4-a option · 946c6c45
      This patch adds support for the Armv8.4-A architecture [1]
      in the arm backend. This is done through the new
      -march=armv8.4-a option.
      
      With this patch armv8.4-a is recognised as an argument
      and supports the extensions: simd, fp16, crypto, nocrypto,
      nofp with the familiar meaning of these options.
      Worth noting that there is no dotprod option like in
      armv8.2-a and armv8.3-a because Dot Product support is
      mandatory in Armv8.4-A when simd is available, so when using
      +simd (of fp16 which enables +simd), the +dotprod is implied.
      
      The various multilib selection makefile fragments are updated
      too and the mutlilib.exp test gets a few armv8.4-a combination
      tests.
      
      Bootstrapped and tested on arm-none-linux-gnueabihf.
      
      From-SVN: r256537
      Kyrylo Tkachov committed
    • re PR target/81821 ([RX] xchg_mem<mode> uses wrong memory operand size) · 99eeb64c
      gcc/
      	PR target/81821
      	* config/rx/rx.md (BW): New mode attribute.
      	(sync_lock_test_and_setsi): Add mode suffix to insn output.
      
      From-SVN: r256536
      Oleg Endo committed
    • re PR tree-optimization/83435 (ICE in set_value_range, at tree-vrp.c:211) · b0bd3e52
      2018-01-11  Richard Biener  <rguenther@suse.de>
      
      	PR tree-optimization/83435
      	* graphite.c (canonicalize_loop_form): Ignore fake loop exit edges.
      	* graphite-scop-detection.c (scop_detection::get_sese): Likewise.
      	* tree-vrp.c (add_assert_info): Drop TREE_OVERFLOW if they appear.
      
      	* gcc.dg/graphite/pr83435.c: New testcase.
      
      From-SVN: r256535
      Richard Biener committed
    • [AArch64] Add const_offset field to aarch64_address_info · dc640181
      This patch records the integer value of the address offset in
      aarch64_address_info, so that it doesn't need to be re-extracted
      from the rtx.  The SVE port will make more use of this.  The patch
      also uses poly_int64 routines to manipulate the offset, rather than
      just handling CONST_INTs.
      
      2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.c (aarch64_address_info): Add a const_offset
      	field.
      	(aarch64_classify_address): Initialize it.  Track polynomial offsets.
      	(aarch64_print_address_internal): Use it to check for a zero offset.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256534
      Richard Sandiford committed
    • [AArch64] Set NUM_POLY_INT_COEFFS to 2 · 6a70badb
      This patch switches the AArch64 port to use 2 poly_int coefficients
      and updates code as necessary to keep it compiling.
      
      One potentially-significant change is to
      aarch64_hard_regno_caller_save_mode.  The old implementation
      was written in a pretty conservative way: it changed the default
      behaviour for single-register values, but used the default handling
      for multi-register values.
      
      I don't think that's necessary, since the interesting cases for this
      macro are usually the single-register ones.  Multi-register modes take
      up the whole of the constituent registers and the move patterns for all
      multi-register modes should be equally good.
      
      Using the original mode for multi-register cases stops us from using
      SVE modes to spill multi-register NEON values.  This was caught by
      gcc.c-torture/execute/pr47538.c.
      
      Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
      GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
      (which are all scalars), and I think it's more obvious, since if we ever
      do use this for elementwise shifts of vector modes, the mask will depend
      on the number of bits in each element rather than the number of bits in
      the whole vector.
      
      2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
      	* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
      	Return a poly_int64 rather than a HOST_WIDE_INT.
      	(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
      	rather than a HOST_WIDE_INT.
      	* config/aarch64/aarch64.h (aarch64_frame): Protect with
      	HAVE_POLY_INT_H rather than HOST_WIDE_INT.  Change locals_offset,
      	hard_fp_offset, frame_size, initial_adjust, callee_offset and
      	final_offset from HOST_WIDE_INT to poly_int64.
      	* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
      	to_constant when getting the number of units in an Advanced SIMD
      	mode.
      	(aarch64_builtin_vectorized_function): Check for a constant number
      	of units.
      	* config/aarch64/aarch64-simd.md (mov<mode>): Handle polynomial
      	GET_MODE_SIZE.
      	(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use the nunits
      	attribute instead of GET_MODE_NUNITS.
      	* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
      	(aarch64_class_max_nregs): Use the constant_lowest_bound of the
      	GET_MODE_SIZE for fixed-size registers.
      	(aarch64_const_vec_all_same_in_range_p): Use const_vec_duplicate_p.
      	(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
      	(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
      	(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
      	(aarch64_print_operand, aarch64_print_address_internal)
      	(aarch64_address_cost, aarch64_rtx_costs, aarch64_register_move_cost)
      	(aarch64_short_vector_p, aapcs_vfp_sub_candidate)
      	(aarch64_simd_attr_length_rglist, aarch64_operands_ok_for_ldpstp):
      	Handle polynomial GET_MODE_SIZE.
      	(aarch64_hard_regno_caller_save_mode): Likewise.  Return modes
      	wider than SImode without modification.
      	(tls_symbolic_operand_type): Use strip_offset instead of split_const.
      	(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
      	(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
      	passing and returning SVE modes.
      	(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
      	rather than GEN_INT.
      	(aarch64_emit_probe_stack_range): Take the size as a poly_int64
      	rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
      	(aarch64_allocate_and_probe_stack_space): Likewise.
      	(aarch64_layout_frame): Cope with polynomial offsets.
      	(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
      	start_offset as a poly_int64 rather than a HOST_WIDE_INT.  Track
      	polynomial offsets.
      	(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
      	(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
      	poly_int64 rather than a HOST_WIDE_INT.
      	(aarch64_get_separate_components, aarch64_process_components)
      	(aarch64_expand_prologue, aarch64_expand_epilogue)
      	(aarch64_use_return_insn_p): Handle polynomial frame offsets.
      	(aarch64_anchor_offset): New function, split out from...
      	(aarch64_legitimize_address): ...here.
      	(aarch64_builtin_vectorization_cost): Handle polynomial
      	TYPE_VECTOR_SUBPARTS.
      	(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
      	GET_MODE_NUNITS.
      	(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
      	number of elements from the PARALLEL rather than the mode.
      	(aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE
      	rather than GET_MODE_BITSIZE.
      	(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_ext)
      	(aarch64_evpc_rev, aarch64_evpc_dup, aarch64_evpc_zip)
      	(aarch64_expand_vec_perm_const_1): Handle polynomial
      	d->perm.length () and d->perm elements.
      	(aarch64_evpc_tbl): Likewise.  Use nelt rather than GET_MODE_NUNITS.
      	Apply to_constant to d->perm elements.
      	(aarch64_simd_valid_immediate, aarch64_vec_fpconst_pow_of_2): Handle
      	polynomial CONST_VECTOR_NUNITS.
      	(aarch64_move_pointer): Take amount as a poly_int64 rather
      	than an int.
      	(aarch64_progress_pointer): Avoid temporary variable.
      	* config/aarch64/aarch64.md (aarch64_<crc_variant>): Use
      	the mode attribute instead of GET_MODE.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256533
      Richard Sandiford committed
    • [AArch64] Rework interface to add constant/offset routines · f5470a77
      The port had aarch64_add_offset and aarch64_add_constant routines
      that did similar things.  This patch replaces them with an expanded
      version of aarch64_add_offset that takes separate source and
      destination registers.  The new routine also takes a poly_int64 offset
      instead of a HOST_WIDE_INT offset, but it leaves the HOST_WIDE_INT
      case to aarch64_add_offset_1, which is basically a repurposed
      aarch64_add_constant_internal.  The SVE patch will put the handling
      of VL-based constants in aarch64_add_offset, while still using
      aarch64_add_offset_1 for the constant part.
      
      The vcall_offset == 0 path in aarch64_output_mi_thunk will use temp0
      as well as temp1 once SVE is added.
      
      A side-effect of the patch is that we now generate:
      
              mov     x29, sp
      
      instead of:
      
              add     x29, sp, 0
      
      in the pr70044.c test.
      
      2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
      	    Alan Hayward  <alan.hayward@arm.com>
      	    David Sherwood  <david.sherwood@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.c (aarch64_force_temporary): Assert that
      	x exists before using it.
      	(aarch64_add_constant_internal): Rename to...
      	(aarch64_add_offset_1): ...this.  Replace regnum with separate
      	src and dest rtxes.  Handle the case in which they're different,
      	including when the offset is zero.  Replace scratchreg with an rtx.
      	Use 2 additions if there is no spare register into which we can
      	move a 16-bit constant.
      	(aarch64_add_constant): Delete.
      	(aarch64_add_offset): Replace reg with separate src and dest
      	rtxes.  Take a poly_int64 offset instead of a HOST_WIDE_INT.
      	Use aarch64_add_offset_1.
      	(aarch64_add_sp, aarch64_sub_sp): Take the scratch register as
      	an rtx rather than an int.  Take the delta as a poly_int64
      	rather than a HOST_WIDE_INT.  Use aarch64_add_offset.
      	(aarch64_expand_mov_immediate): Update uses of aarch64_add_offset.
      	(aarch64_expand_prologue): Update calls to aarch64_sub_sp,
      	aarch64_allocate_and_probe_stack_space and aarch64_add_offset.
      	(aarch64_expand_epilogue): Update calls to aarch64_add_offset
      	and aarch64_add_sp.
      	(aarch64_output_mi_thunk): Use aarch64_add_offset rather than
      	aarch64_add_constant.
      
      gcc/testsuite/
      	* gcc.target/aarch64/pr70044.c: Allow "mov x29, sp" too.
      
      Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
      Co-Authored-By: David Sherwood <david.sherwood@arm.com>
      
      From-SVN: r256532
      Richard Sandiford committed
    • [AArch64] Extra scalar_float_mode patch · 0d0e0188
      In preparation for the switch to NUM_POLY_INT_COEFFS==2.
      
      2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
      
      gcc/
      	* config/aarch64/aarch64.c (aarch64_reinterpret_float_as_int):
      	Use scalar_float_mode.
      
      From-SVN: r256531
      Richard Sandiford committed
    • [AArch64] Avoid GET_MODE_NUNITS in v8.4 support · f3bd9505
      This patch replaces GET_MODE_NUNITS in some of the v8.4 support
      with equivalent values, in preparation for the switch to
      NUM_POLY_INT_COEFFS==2.
      
      2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
      
      gcc/
      	* config/aarch64/aarch64-simd.md
      	(aarch64_fml<f16mac1>l<f16quad>_low<mode>): Avoid GET_MODE_NUNITS.
      	(aarch64_fml<f16mac1>l<f16quad>_high<mode>): Likewise.
      	(aarch64_fml<f16mac1>l_lane_lowv2sf): Likewise.
      	(aarch64_fml<f16mac1>l_lane_highv2sf): Likewise.
      	(aarch64_fml<f16mac1>lq_laneq_lowv4sf): Likewise.
      	(aarch64_fml<f16mac1>lq_laneq_highv4sf): Likewise.
      	(aarch64_fml<f16mac1>l_laneq_lowv2sf): Likewise.
      	(aarch64_fml<f16mac1>l_laneq_highv2sf): Likewise.
      	(aarch64_fml<f16mac1>lq_lane_lowv4sf): Likewise.
      	(aarch64_fml<f16mac1>lq_lane_highv4sf): Likewise.
      
      From-SVN: r256530
      Richard Sandiford committed