1. 01 Feb, 2020 3 commits
  2. 31 Jan, 2020 28 commits
    • c++: Reduce memory consumption for arrays of non-aggregate type. · e98ebda0
      The remaining low-hanging fruit for improvement on memory consumption in the
      14179 testcase was the duplication of the CONSTRUCTOR for the array by
      reshape_init.  This patch changes reshape_init to reuse a single constructor
      for an array of non-aggregate type such as the one in the testcase.
      
      	PR c++/14179
      	* decl.c (reshape_init_array_1): Reuse a single CONSTRUCTOR with
      	non-aggregate elements.
      	(reshape_init_array): Add first_initializer_p parm.
      	(reshape_init_r): Change first_initializer_p from bool to tree.
      	(reshape_init): Pass init to it.
      Jason Merrill committed
    • c++: Reduce memory consumption for large static arrays. · d2b9548f
      PR14179 and the C counterpart PR12245 are about memory consumption of very
      large file-scope arrays.  Recently, location wrappers increased memory
      consumption significantly: in an array of integer constants, each one will
      have a location wrapper, which added up to over 500MB in the 14179
      testcase.  For this kind of testcase tracking these locations isn't worth
      the cost, so this patch turns the wrappers off after 256 elements; any array
      that size or larger isn't likely to be interested in the location of
      individual integer constants.
      
      	PR c++/14179
      	* parser.c (cp_parser_initializer_list): Suppress location wrappers
      	after 256 elements.
      Jason Merrill committed
    • analyzer: fix ICE with 'const void *' (PR 93457) · 67751724
      gcc/analyzer/ChangeLog:
      	PR analyzer/93457
      	* region-model.cc (make_region_for_type): Use VOID_TYPE_P rather
      	than checking against void_type_node.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/93457
      	* gcc.dg/analyzer/pr93457.c: New test.
      David Malcolm committed
    • analyzer: fix ICE handling void-type (PR 93373) · 09bea584
      gcc/analyzer/ChangeLog:
      	PR analyzer/93373
      	* region-model.cc (ASSERT_COMPAT_TYPES): Convert to...
      	(assert_compat_types): ...this, and bail when either type is NULL,
      	or when VOID_TYPE_P (dst_type).
      	(region_model::get_lvalue): Update for above conversion.
      	(region_model::get_rvalue): Likewise.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/93373
      	* gcc.dg/analyzer/torture/pr93373.c: New test.
      David Malcolm committed
    • Fix for PR 91333 - suboptimal register allocation for inline asm · 2a07345c
          2020-01-31  Vladimir Makarov  <vmakarov@redhat.com>
      
                  PR rtl-optimization/91333
                  * ira-color.c (bucket_allocno_compare_func): Move conflict hard
                  reg preferences comparison up.
      
          2020-01-31  Vladimir Makarov  <vmakarov@redhat.com>
      
                  PR rtl-optimization/91333
                  * gcc.target/i386/pr91333.c: New.
      Vladimir N. Makarov committed
    • analyzer: fix ICE getting void return value (PR 93379) · f1c807e8
      PR analyzer/93379 reports an ICE within
      region_model::update_for_return_superedge when writing the
      returned svalue_id to the lhs of the call_stmt
      
      The root cause is that this analyzer code assumed that for any call
      with a non-NULL gimple_call_lhs, the called fndecl would have non-void
      return type, and thus that a non-null svalue_id would be returned from
      region_model::pop_frame.  This isn't the case e.g. for a call with
      conflicting types where the callee returns void but the caller assumes
      int.
      
      This patch fixes the ICE by moving the check for null result so that
      it also guards setting the lhs.
      
      gcc/analyzer/ChangeLog:
      	PR analyzer/93379
      	* region-model.cc (region_model::update_for_return_superedge):
      	Move check for null result so that it also guards setting the
      	lhs.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/93379
      	* gcc.dg/analyzer/torture/pr93379-2.c: New test.
      	* gcc.dg/analyzer/torture/pr93379.c: New test.
      David Malcolm committed
    • analyzer: fix ICE with pointers between stack frames (PR 93438) · 455f58ec
      PR analyzer/93438 reports an ICE when merging two region_models
      in which an older stack frame has a local pointing to a local in
      a more recent stack frame.
      
        stack
          older frame
            int *: "ow" --+
                          |
          newer frame     |
            int: "pk" <---+
      
      The root cause is that the state-merging code assumes that all frame
      regions in the merged model have already been created.
      stack_region::can_merge_p iterates through the frames, creating
      and populating each merged frame in turn, so when it attempts to
      populate the older frame, it attempts to reference the newer frame in
      the merged model, which doesn't exist yet.
      
      This patch reworks stack_region::can_merge_p to use a two-pass approach
      in which all frames in the merged model are created first, and then
      are all populated, fixing the bug.
      
      gcc/analyzer/ChangeLog:
      	PR analyzer/93438
      	* region-model.cc (stack_region::can_merge_p): Split into a two
      	pass approach, creating all stack regions first, then populating
      	them.
      	(selftest::test_state_merging): Add test coverage for (a) the case
      	of self-merging a model in which a local in an older stack frame
      	points to a local in a more recent stack frame (which previously
      	would ICE), and (b) the case of self-merging a model in which a
      	local points to a global (which previously worked OK).
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/93438
      	* gcc.dg/analyzer/torture/pr93438.c: New test.
      	* gcc.dg/analyzer/torture/pr93438-2.c: New test.
      David Malcolm committed
    • testsuite: Fix up pr91838.C test [PR91838] · 5910b145
      The test FAILs on i686-linux with:
      FAIL: g++.dg/pr91838.C   (test for excess errors)
      Excess errors:
      /home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:8: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi]
      /home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:3: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi]
      and on x86_64-linux with -m32 testing with failure to match the
      expected pattern in there (or both with e.g. -m32/-mno-mmx/-mno-sse testing).
      The test is also in a wrong directory, has non-standard specification that
      it requires c++11 or later.
      
      2020-01-31  Jakub Jelinek  <jakub@redhat.com>
      
      	PR rtl-optimization/91838
      	* g++.dg/pr91838.C: Moved to ...
      	* g++.dg/opt/pr91838.C: ... here.  Require c++11 target instead of
      	dg-skip-if for c++98.  Pass -Wno-psabi -w to avoid psabi style
      	warnings on vector arg passing or return.  Add -masm=att on i?86/x86_64.
      	Only check for pxor %xmm0, %xmm0 on lp64 i?86/x86_64.
      Jakub Jelinek committed
    • aarch64: Add Armv8.6 SVE bfloat16 support · 896dff99
      This patch adds support for the SVE intrinsics that map to Armv8.6
      bfloat16 instructions.  This means that svcvtnt is now a base SVE
      function for one type suffix combination; the others are still
      SVE2-specific.
      
      This relies on a binutils fix:
      
          https://sourceware.org/ml/binutils/2020-01/msg00450.html
      
      so anyone testing older binutils 2.34 or binutils master sources will
      need to upgrade to get clean test results.  (At the time of writing,
      no released version of binutils has this bug.)
      
      2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.h (TARGET_SVE_BF16): New macro.
      	* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtnt): Move to
      	aarch64-sve-builtins-base.h.
      	* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtnt): Move to
      	aarch64-sve-builtins-base.cc.
      	* config/aarch64/aarch64-sve-builtins-base.h (svbfdot, svbfdot_lane)
      	(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
      	(svcvtnt): Declare.
      	* config/aarch64/aarch64-sve-builtins-base.cc (svbfdot, svbfdot_lane)
      	(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
      	(svcvtnt): New functions.
      	* config/aarch64/aarch64-sve-builtins-base.def (svbfdot, svbfdot_lane)
      	(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
      	(svcvtnt): New functions.
      	(svcvt): Add a form that converts f32 to bf16.
      	* config/aarch64/aarch64-sve-builtins-shapes.h (ternary_bfloat)
      	(ternary_bfloat_lane, ternary_bfloat_lanex2, ternary_bfloat_opt_n):
      	Declare.
      	* config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type):
      	Treat B as bfloat16_t.
      	(ternary_bfloat_lane_base): New class.
      	(ternary_bfloat_def): Likewise.
      	(ternary_bfloat): New shape.
      	(ternary_bfloat_lane_def): New class.
      	(ternary_bfloat_lane): New shape.
      	(ternary_bfloat_lanex2_def): New class.
      	(ternary_bfloat_lanex2): New shape.
      	(ternary_bfloat_opt_n_def): New class.
      	(ternary_bfloat_opt_n): New shape.
      	* config/aarch64/aarch64-sve-builtins.cc (TYPES_cvt_bfloat): New macro.
      	* config/aarch64/aarch64-sve.md (@aarch64_sve_<sve_fp_op>vnx4sf)
      	(@aarch64_sve_<sve_fp_op>_lanevnx4sf): New patterns.
      	(@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>)
      	(@cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
      	(*cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
      	(@aarch64_sve_cvtnt<VNx8BF_ONLY:mode>): Likewise.
      	* config/aarch64/aarch64-sve2.md (@aarch64_sve2_cvtnt<mode>): Key
      	the pattern off the narrow mode instead of the wider one.
      	* config/aarch64/iterators.md (VNx8BF_ONLY): New mode iterator.
      	(UNSPEC_BFMLALB, UNSPEC_BFMLALT, UNSPEC_BFMMLA): New unspecs.
      	(sve_fp_op): Handle them.
      	(SVE_BFLOAT_TERNARY_LONG): New int itertor.
      	(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
      
      gcc/testsuite/
      	* lib/target-supports.exp (check_effective_target_aarch64_asm_bf16_ok):
      	New proc.
      	* gcc.target/aarch64/sve/acle/asm/bfdot_f32.c: New test.
      	* gcc.target/aarch64/sve/acle/asm/bfdot_lane_f32.c: Likweise.
      	* gcc.target/aarch64/sve/acle/asm/bfmlalb_f32.c: Likweise.
      	* gcc.target/aarch64/sve/acle/asm/bfmlalb_lane_f32.c: Likweise.
      	* gcc.target/aarch64/sve/acle/asm/bfmlalt_f32.c: Likweise.
      	* gcc.target/aarch64/sve/acle/asm/bfmlalt_lane_f32.c: Likweise.
      	* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likweise.
      	* gcc.target/aarch64/sve/acle/asm/cvt_bf16.c: Likweise.
      	* gcc.target/aarch64/sve/acle/asm/cvtnt_bf16.c: Likweise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_1.c: Likweise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lane_1.c:
      	Likweise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lanex2_1.c:
      	Likweise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
      	Likweise.
      Richard Sandiford committed
    • aarch64: Add svbfloat16_t support to arm_sve.h · 02fcd8ac
      This patch adds support for the bfloat16-related vectors to
      arm_sve.h.  It also adds support for functions that just treat
      bfloat16_t as a bag of 16 bits; these functions are available
      for bf16 whenever they're available for other 16-bit types.
      
      Previously "all_data" was used for both data movement and for arithmetic
      that happened to be defined for all data types.  Adding bf16 means we
      need to distinguish between the two cases.
      
      The patch also reorders the mode definitions in aarch64-modes.def,
      which means we no longer need separate VECTOR_MODE entries for BF
      vectors.
      
      2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/arm_sve.h: Include arm_bf16.h.
      	* config/aarch64/aarch64-modes.def (BF): Move definition before
      	VECTOR_MODES.  Remove separate VECTOR_MODES for V4BF and V8BF.
      	(SVE_MODES): Handle BF modes.
      	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
      	BF modes.
      	(aarch64_full_sve_mode): Likewise.
      	* config/aarch64/iterators.md (SVE_STRUCT): Add VNx16BF, VNx24BF
      	and VNx32BF.
      	(SVE_FULL, SVE_FULL_HSD, SVE_ALL): Add VNx8BF.
      	(Vetype, Vesize, Vctype, VEL, Vel, VEL_INT, V128, v128, vwcore)
      	(V_INT_EQUIV, v_int_equiv, V_FP_EQUIV, v_fp_equiv, vector_count)
      	(insn_length, VSINGLE, vsingle, VPRED, vpred, VDOUBLE): Handle the
      	new SVE BF modes.
      	* config/aarch64/aarch64-sve-builtins.h (TYPE_bfloat): New
      	type_class_index.
      	* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_arith): New macro.
      	(TYPES_all_data): Add bf16.
      	(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
      	(register_tuple_type): Increase buffer size.
      	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): New type.
      	(bf16): New type suffix.
      	* config/aarch64/aarch64-sve-builtins-base.def (svabd, svadd, svaddv)
      	(svcmpeq, svcmpge, svcmpgt, svcmple, svcmplt, svcmpne, svmad, svmax)
      	(svmaxv, svmin, svminv, svmla, svmls, svmsb, svmul, svsub, svsubr):
      	Change type from all_data to all_arith.
      	* config/aarch64/aarch64-sve-builtins-sve2.def (svaddp, svmaxp)
      	(svminp): Likewise.
      
      gcc/testsuite/
      	* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Test mangling
      	of svbfloat16_t.
      	* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise for
      	__SVBfloat16_t.
      	* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: New test.
      	* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/cnt_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/create2_1.c (create_bf16): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/create3_1.c (create_bf16): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/create4_1.c (create_bf16): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dup_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ext_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/get2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/get3_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/get4_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/insr_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lasta_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/lastb_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld3_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld4_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/len_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
      	(reinterpret_f16_bf16_tied1, reinterpret_f16_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
      	(reinterpret_f32_bf16_tied1, reinterpret_f32_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
      	(reinterpret_f64_bf16_tied1, reinterpret_f64_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
      	(reinterpret_s16_bf16_tied1, reinterpret_s16_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
      	(reinterpret_s32_bf16_tied1, reinterpret_s32_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
      	(reinterpret_s64_bf16_tied1, reinterpret_s64_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
      	(reinterpret_s8_bf16_tied1, reinterpret_s8_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
      	(reinterpret_u16_bf16_tied1, reinterpret_u16_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
      	(reinterpret_u32_bf16_tied1, reinterpret_u32_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
      	(reinterpret_u64_bf16_tied1, reinterpret_u64_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
      	(reinterpret_u8_bf16_tied1, reinterpret_u8_bf16_untied): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/rev_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/sel_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/set2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/set3_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/set4_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/splice_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/st1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/st2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/st3_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/st4_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/tbl_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/undef2_1.c (bfloat16_t): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/undef3_1.c (bfloat16_t): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/undef4_1.c (bfloat16_t): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/undef_1.c (bfloat16_t): Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/annotate_1.c (ret_bf16, ret_bf16x2)
      	(ret_bf16x3, ret_bf16x4): Likewise.
      	* gcc.target/aarch64/sve/pcs/annotate_2.c (fn_bf16, fn_bf16x2)
      	(fn_bf16x3, fn_bf16x4): Likewise.
      	* gcc.target/aarch64/sve/pcs/annotate_3.c (fn_bf16, fn_bf16x2)
      	(fn_bf16x3, fn_bf16x4): Likewise.
      	* gcc.target/aarch64/sve/pcs/annotate_4.c (fn_bf16, fn_bf16x2)
      	(fn_bf16x3, fn_bf16x4): Likewise.
      	* gcc.target/aarch64/sve/pcs/annotate_5.c (fn_bf16, fn_bf16x2)
      	(fn_bf16x3, fn_bf16x4): Likewise.
      	* gcc.target/aarch64/sve/pcs/annotate_6.c (fn_bf16, fn_bf16x2)
      	(fn_bf16x3, fn_bf16x4): Likewise.
      	* gcc.target/aarch64/sve/pcs/annotate_7.c (fn_bf16, fn_bf16x2)
      	(fn_bf16x3, fn_bf16x4): Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_bf16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/gnu_vectors_1.c (bfloat16x16_t): New
      	typedef.
      	(bfloat16_callee, bfloat16_caller): New tests.
      	* gcc.target/aarch64/sve/pcs/gnu_vectors_2.c (bfloat16x16_t): New
      	typedef.
      	(bfloat16_callee, bfloat16_caller): New tests.
      	* gcc.target/aarch64/sve/pcs/return_4.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_4_128.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_4_256.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_4_512.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_4_1024.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_4_2048.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_5.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_5_128.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_5_256.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_5_512.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_5_1024.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_5_2048.c (CALLER_BF16): New macro.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_6.c (bfloat16_t): New typedef.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_6_128.c (bfloat16_t): New typedef.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_6_256.c (bfloat16_t): New typedef.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_6_512.c (bfloat16_t): New typedef.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_6_1024.c (bfloat16_t): New typedef.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_6_2048.c (bfloat16_t): New typedef.
      	(callee_bf16, caller_bf16): New tests.
      	* gcc.target/aarch64/sve/pcs/return_7.c (callee_bf16): Likewise
      	(caller_bf16): Likewise.
      	* gcc.target/aarch64/sve/pcs/return_8.c (callee_bf16): Likewise
      	(caller_bf16): Likewise.
      	* gcc.target/aarch64/sve/pcs/return_9.c (callee_bf16): Likewise
      	(caller_bf16): Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c: Likewise.
      	* gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c: Likewise.
      Richard Sandiford committed
    • aarch64: Add Armv8.6 SVE matrix multiply support · 36696774
      This mostly follows existing practice.  Perhaps the only noteworthy
      thing is that svmmla is split across three extensions (i8mm, f32mm
      and f64mm), any of which can be enabled independently.  The easiest
      way of coping with this seemed to be to add a fourth svmmla entry
      for base SVE, but with no type suffixes.  This means that the
      overloaded function is always available for C, but never successfully
      resolves without the appropriate target feature.
      
      2020-01-31  Dennis Zhang  <dennis.zhang@arm.com>
      	    Matthew Malcomson  <matthew.malcomson@arm.com>
      	    Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* doc/invoke.texi (f32mm): Document new AArch64 -march= extension.
      	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
      	__ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and
      	__ARM_FEATURE_SVE_MATMUL_FP64 as appropriate.  Don't define
      	__ARM_FEATURE_MATMUL_FP64.
      	* config/aarch64/aarch64-option-extensions.def (fp, simd, fp16)
      	(sve): Add AARCH64_FL_F32MM to the list of extensions that should
      	be disabled at the same time.
      	(f32mm): New extension.
      	* config/aarch64/aarch64.h (AARCH64_FL_F32MM): New macro.
      	(AARCH64_FL_F64MM): Bump to the next bit up.
      	(AARCH64_ISA_F32MM, TARGET_SVE_I8MM, TARGET_F32MM, TARGET_SVE_F32MM)
      	(TARGET_SVE_F64MM): New macros.
      	* config/aarch64/iterators.md (SVE_MATMULF): New mode iterator.
      	(UNSPEC_FMMLA, UNSPEC_SMATMUL, UNSPEC_UMATMUL, UNSPEC_USMATMUL)
      	(UNSPEC_TRN1Q, UNSPEC_TRN2Q, UNSPEC_UZP1Q, UNSPEC_UZP2Q, UNSPEC_ZIP1Q)
      	(UNSPEC_ZIP2Q): New unspeccs.
      	(DOTPROD_US_ONLY, PERMUTEQ, MATMUL, FMMLA): New int iterators.
      	(optab, sur, perm_insn): Handle the new unspecs.
      	(sve_fp_op): Handle UNSPEC_FMMLA.  Resort.
      	* config/aarch64/aarch64-sve.md (@aarch64_sve_ld1ro<mode>): Use
      	TARGET_SVE_F64MM instead of separate tests.
      	(@aarch64_<DOTPROD_US_ONLY:sur>dot_prod<vsi2qi>): New pattern.
      	(@aarch64_<DOTPROD_US_ONLY:sur>dot_prod_lane<vsi2qi>): Likewise.
      	(@aarch64_sve_add_<MATMUL:optab><vsi2qi>): Likewise.
      	(@aarch64_sve_<FMMLA:sve_fp_op><mode>): Likewise.
      	(@aarch64_sve_<PERMUTEQ:optab><mode>): Likewise.
      	* config/aarch64/aarch64-sve-builtins.cc (TYPES_s_float): New macro.
      	(TYPES_s_float_hsd_integer, TYPES_s_float_sd_integer): Use it.
      	(TYPES_s_signed): New macro.
      	(TYPES_s_integer): Use it.
      	(TYPES_d_float): New macro.
      	(TYPES_d_data): Use it.
      	* config/aarch64/aarch64-sve-builtins-shapes.h (mmla): Declare.
      	(ternary_intq_uintq_lane, ternary_intq_uintq_opt_n, ternary_uintq_intq)
      	(ternary_uintq_intq_lane, ternary_uintq_intq_opt_n): Likewise.
      	* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): New class.
      	(svmmla): New shape.
      	(ternary_resize2_opt_n_base): Add TYPE_CLASS2 and TYPE_CLASS3
      	template parameters.
      	(ternary_resize2_lane_base): Likewise.
      	(ternary_resize2_base): New class.
      	(ternary_qq_lane_base): Likewise.
      	(ternary_intq_uintq_lane_def): Likewise.
      	(ternary_intq_uintq_lane): New shape.
      	(ternary_intq_uintq_opt_n_def): New class
      	(ternary_intq_uintq_opt_n): New shape.
      	(ternary_qq_lane_def): Inherit from ternary_qq_lane_base.
      	(ternary_uintq_intq_def): New class.
      	(ternary_uintq_intq): New shape.
      	(ternary_uintq_intq_lane_def): New class.
      	(ternary_uintq_intq_lane): New shape.
      	(ternary_uintq_intq_opt_n_def): New class.
      	(ternary_uintq_intq_opt_n): New shape.
      	* config/aarch64/aarch64-sve-builtins-base.h (svmmla, svsudot)
      	(svsudot_lane, svtrn1q, svtrn2q, svusdot, svusdot_lane, svusmmla)
      	(svuzp1q, svuzp2q, svzip1q, svzip2q): Declare.
      	* config/aarch64/aarch64-sve-builtins-base.cc (svdot_lane_impl):
      	Generalize to...
      	(svdotprod_lane_impl): ...this new class.
      	(svmmla_impl, svusdot_impl): New classes.
      	(svdot_lane): Update to use svdotprod_lane_impl.
      	(svmmla, svsudot, svsudot_lane, svtrn1q, svtrn2q, svusdot)
      	(svusdot_lane, svusmmla, svuzp1q, svuzp2q, svzip1q, svzip2q): New
      	functions.
      	* config/aarch64/aarch64-sve-builtins-base.def (svmmla): New base
      	function, with no types defined.
      	(svmmla, svusmmla, svsudot, svsudot_lane, svusdot, svusdot_lane): New
      	AARCH64_FL_I8MM functions.
      	(svmmla): New AARCH64_FL_F32MM function.
      	(svld1ro): Depend only on AARCH64_FL_F64MM, not on AARCH64_FL_V8_6.
      	(svmmla, svtrn1q, svtrn2q, svuz1q, svuz2q, svzip1q, svzip2q): New
      	AARCH64_FL_F64MM function.
      	(REQUIRED_EXTENSIONS):
      
      gcc/testsuite/
      	* lib/target-supports.exp (check_effective_target_aarch64_asm_i8mm_ok)
      	(check_effective_target_aarch64_asm_f32mm_ok): New target selectors.
      	* gcc.target/aarch64/pragma_cpp_predefs_2.c: Test handling of
      	__ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and
      	__ARM_FEATURE_SVE_MATMUL_FP64.
      	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_TRIPLE_Z):
      	(TEST_TRIPLE_Z_REV2, TEST_TRIPLE_Z_REV, TEST_TRIPLE_LANE_REG)
      	(TEST_TRIPLE_ZX): New macros.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Remove +sve and
      	rely on +f64mm to enable it.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/mmla_f32.c: New test.
      	* gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise,
      	* gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise,
      	* gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise,
      	* gcc.target/aarch64/sve/acle/asm/sudot_lane_s32.c: Likewise,
      	* gcc.target/aarch64/sve/acle/asm/sudot_s32.c: Likewise,
      	* gcc.target/aarch64/sve/acle/asm/trn1q_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn1q_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/trn2q_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/usdot_lane_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/usdot_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp1q_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/uzp2q_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip1q_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_f16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_f32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_f64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_s16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_s32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_s64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_s8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_u16.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_u32.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_u64.c: Likewise.
      	* gcc.target/aarch64/sve/acle/asm/zip2q_u8.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/mmla_1.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/mmla_2.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/mmla_3.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/mmla_4.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/mmla_5.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/mmla_6.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/mmla_7.c: Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_lane_1.c:
      	Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_opt_n_1.c:
      	Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_1.c:
      	Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_lane_1.c:
      	Likewise.
      	* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_opt_n_1.c:
      	Likewise.
      Richard Sandiford committed
    • aarch64: Fix SVE PCS failures for BE & ILP32 · 2171a920
      This patch should (finally!) give clean test results for
      aarch64-sve-pcs.exp for all {be,le}{lp64,ilp32} combinations.
      
      The *_128.c tests require aarch64_little_endian because they test for
      fixed-length 128-bit code, whereas -msve-vector-bits=128 still generates
      VLA code for big-endian.
      
      Some tests require lp64 because they match (64-bit) pointer loads and
      stores.  Others require it because ilp32 adds extra zero extensions.
      
      We still have a non-trivial amount of coverage for -mbig-endian -mabi=ilp32:
      
       # of expected passes            663
       # of unsupported tests          59
      
      2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/pcs/args_1.c: Require lp64 for
      	check-function-bodies tests.
      	* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Require lp64.
      	* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_7.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/args_9.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_4_128.c: Require lp64 and
      	aarch64_little_endian for check-function-bodies tests.
      	* gcc.target/aarch64/sve/pcs/return_5_128.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
      	* gcc.target/aarch64/sve/pcs/return_1_128.c: Likewise.  Remove
      	target selector from dg-compile.
      	* gcc.target/aarch64/sve/pcs/return_6_128.c: Likewise.
      Richard Sandiford committed
    • libstdc++: Always return a sentinel<I> from __gnu_test::test_range::end() · 6e5a1963
      It seems that in practice std::sentinel_for<I, I> is always true, and so the
      test_range container doesn't help us detect bugs in ranges code in which we
      wrongly assume that a sentinel can be manipulated like an iterator.  Make the
      test_range range more strict by having end() unconditionally return a
      sentinel<I>, and adjust some tests accordingly.
      
      libstdc++-v3/ChangeLog:
      
      	* testsuite/24_iterators/range_operations/distance.cc: Do not assume
      	test_range::end() returns the same type as test_range::begin().
      	* testsuite/24_iterators/range_operations/next.cc: Likewise.
      	* testsuite/24_iterators/range_operations/prev.cc: Likewise.
      	* testsuite/util/testsuite_iterators.h (__gnu_test::test_range::end):
      	Always return a sentinel<I>.
      Patrick Palka committed
    • Fix conditional add LRA failure for amdgcn · b9270938
      Fix ICE in testcase gfortran.dg/assumed_rank_bounds_3.f90.
      
      2020-01-31  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* config/gcn/gcn-valu.md (addv64di3_exec): Allow one '0' in each
      	alternative only.
      Andrew Stubbs committed
    • Fix TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling. · 828573a5
      The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is
      only insn size, as advised in e.g. Software Optimization Guide for the
      AMD Family 15h Processors [1], section 7.1.2, where it is said:
      
      --quote--
      7.1.2 Reduce Instruction SizeOptimization
      
      Reduce the size of instructions when possible.
      
      Rationale
      
      Using smaller instruction sizes improves instruction fetch throughput.
      Specific examples include the following:
      
      *In SIMD code, use the single-precision (PS) form of instructions
      instead of the double-precision (PD) form. For example, for register
      to register moves, MOVAPS achieves the same result as MOVAPD, but uses
      one less byte to encode the instruction and has no prefix byte. Other
      examples in which single-precision forms can be substituted for
      double-precision forms include MOVUPS, MOVNTPS, XORPS, ORPS, ANDPS,
      and SHUFPS.
      ...
      --/quote--
      
      Please note that this optimization applies only to non-AVX forms, as
      demonstrated by:
      
         0:   0f 28 c8                movaps %xmm0,%xmm1
         3:   66 0f 28 c8             movapd %xmm0,%xmm1
         7:   c5 f8 28 d1             vmovaps %xmm1,%xmm2
         b:   c5 f9 28 d1             vmovapd %xmm1,%xmm2
      
      Also note that MOVDQA is missing in the above optimization. It is
      harmful to substitute MOVDQA with MOVAPS, as it can (and does)
      introduce +1 cycle forwarding penalty between FLT (FPA/FPM) and INT
      (VALU) FP clusters.
      
      [1] https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf
      Uros Bizjak committed
    • [amdgcn] Scale number of threads/workers with VGPR usage · 5a28e272
      2020-01-31  Kwok Cheung Yeung  <kcy@codesourcery.com>
      
      	gcc/
      	* config/gcn/mkoffload.c (process_asm): Add sgpr_count and vgpr_count
      	to definition of hsa_kernel_description.  Parse assembly to find SGPR
      	and VGPR count of kernel and store in hsa_kernel_description.
      
      	libgomp/
      	* plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count
      	and vgpr_count fields.
      	(struct kernel_info): Add a field for a hsa_kernel_description.
      	(run_kernel): Reduce the number of threads/workers if the requested
      	number would require too many VGPRs.
      	(init_basic_kernel_info): Initialize description field with
      	the hsa_kernel_description entry for the kernel.
      Kwok Cheung Yeung committed
    • [Fortran] Disable front-end optimization for OpenACC atomic (PR93462) · 6a97d9ea
              PR fortran/93462
              * frontend-passes.c (gfc_code_walker): For EXEC_OACC_ATOMIC, set
              in_omp_atomic to true prevent front-end optimization.
      
              PR fortran/93462
              * gfortran.dg/goacc/atomic-1.f90: New.
      Tobias Burnus committed
    • middle-end: Fix logical shift truncation (PR rtl-optimization/91838) · e60b1e23
      This fixes a fall-out from a patch I had submitted two years ago which started
      allowing simplify-rtx to fold logical right shifts by offsets a followed by b
      into >> (a + b).
      
      However this can generate inefficient code when the resulting shift count ends
      up being the same as the size of the shift mode.  This will create some
      undefined behavior on most platforms.
      
      This patch changes to code to truncate to 0 if the shift amount goes out of
      range.  Before my older patch this used to happen in combine when it saw the
      two shifts.  However since we combine them here combine never gets a chance to
      truncate them.
      
      The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal
      with this shift constant but it's better to do the right thing in simplify-rtx.
      
      Note that this doesn't take care of the Arithmetic shift where you could replace
      the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it.
      
      gcc/ChangeLog:
      
      	PR rtl-optimization/91838
      	* simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
      	to truncate if allowed or reject combination.
      
      gcc/testsuite/ChangeLog:
      
      	PR rtl-optimization/91838
      	* g++.dg/pr91838.C: New test.
      Tamar Christina committed
    • Fix fast-math-pr55281.c ICE · c63ae7f0
      2020-01-31  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* tree-ssa-loop-ivopts.c (get_iv): Use sizetype for zero-step.
      	(find_inv_vars_cb): Likewise.
      Andrew Stubbs committed
    • calls.c: refactor special_function_p for use by analyzer (v2) · 182ce042
      This patch refactors some code in special_function_p that checks for
      the function being sane to match by name, splitting it out into a new
      maybe_special_function_p, and using it it two places in the analyzer.
      
      gcc/analyzer/ChangeLog:
      	* analyzer.cc (is_named_call_p): Replace tests for fndecl being
      	extern at file scope and having a non-NULL DECL_NAME with a call
      	to maybe_special_function_p.
      	* function-set.cc (function_set::contains_decl_p): Add call to
      	maybe_special_function_p.
      
      gcc/ChangeLog:
      	* calls.c (special_function_p): Split out the check for DECL_NAME
      	being non-NULL and fndecl being extern at file scope into a
      	new maybe_special_function_p and call it.  Drop check for fndecl
      	being non-NULL that was after a usage of DECL_NAME (fndecl).
      	* tree.h (maybe_special_function_p): New inline function.
      David Malcolm committed
    • analyzer: further fixes for comparisons between uncomparable types (PR 93450) · 45eb3e49
      gcc/analyzer/ChangeLog:
      	PR analyzer/93450
      	* constraint-manager.cc
      	(constraint_manager::get_or_add_equiv_class): Only compare constants
      	if their types are compatible.
      	* region-model.cc (constant_svalue::eval_condition): Replace check
      	for identical types with call to types_compatible_p.
      David Malcolm committed
    • Zero-initialise masked load destinations · 95607c12
      Fixes an execution failure in testcase gfortran.dg/assumed_rank_1.f90.
      
      2020-01-30  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* config/gcn/gcn-valu.md (gather<mode>_exec): Move contents ...
      	(mask_gather_load<mode>): ... here, and zero-initialize the
      	destination.
      	(maskload<mode>di): Zero-initialize the destination.
      	* config/gcn/gcn.c:
      Andrew Stubbs committed
    • analyzer: add extrinsic_state::dump · 42f36563
      gcc/analyzer/ChangeLog:
      	* program-state.cc (extrinsic_state::dump_to_pp): New.
      	(extrinsic_state::dump_to_file): New.
      	(extrinsic_state::dump): New.
      	* program-state.h (extrinsic_state::dump_to_pp): New decl.
      	(extrinsic_state::dump_to_file): New decl.
      	(extrinsic_state::dump): New decl.
      	* sm.cc: Include "pretty-print.h".
      	(state_machine::dump_to_pp): New.
      	* sm.h (state_machine::dump_to_pp): New decl.
      David Malcolm committed
    • analyzer: make extrinsic_state field private · ebe9174e
      gcc/analyzer/ChangeLog:
      	* diagnostic-manager.cc (for_each_state_change): Use
      	extrinsic_state::get_num_checkers rather than accessing m_checkers
      	directly.
      	* program-state.cc (program_state::program_state): Likewise.
      	* program-state.h (extrinsic_state::m_checkers): Make private.
      David Malcolm committed
    • Daily bump. · bba54d62
      GCC Administrator committed
    • analyzer: avoid using <string.h> in malloc-1.c · 3e990d79
      This test assumes that memset and strlen have been marked with
      __attribute__((nonnull)), which isn't necessarily the case for an
      arbitrary <string.h>.  This likely explains these failures:
        FAIL: gcc.dg/analyzer/malloc-1.c  (test for warnings, line 417)
        FAIL: gcc.dg/analyzer/malloc-1.c  (test for warnings, line 418)
        FAIL: gcc.dg/analyzer/malloc-1.c  (test for warnings, line 425)
        FAIL: gcc.dg/analyzer/malloc-1.c  (test for warnings, line 429)
      seen in https://gcc.gnu.org/ml/gcc-testresults/2020-01/msg01608.html
      on x86_64-apple-darwin18.
      
      Fix it by using the __builtin_ forms.
      
      gcc/testsuite/ChangeLog:
      	* gcc.dg/analyzer/malloc-1.c: Remove include of <string.h>.
      	Use __builtin_ forms of memset and strlen throughout.
      David Malcolm committed
    • analyzer: convert conditionals-2.c to a torture test · e34ad101
      gcc/testsuite/ChangeLog:
      	* gcc.dg/analyzer/conditionals-2.c: Move to...
      	* gcc.dg/analyzer/torture/conditionals-2.c: ...here, converting
      	to a torture test.  Remove redundant include.
      David Malcolm committed
    • analyzer: fix ICE in __builtin_isnan (PR 93356) · e978955d
      PR analyzer/93356 reports an ICE handling __builtin_isnan due to a
      failing assertion:
        674     gcc_assert (lhs_ec_id != rhs_ec_id);
      with op=UNORDERED_EXPR.
      when attempting to add an UNORDERED_EXPR constraint.
      
      This is an overzealous assertion, but underlying it are various forms of
      sloppiness regarding NaN within the analyzer:
      
        (a) the assumption in the constraint_manager that equivalence classes
        are reflexive (X == X), which isn't the case for NaN.
      
        (b) Hardcoding the "honor_nans" param to false when calling
        invert_tree_comparison throughout the analyzer.
      
        (c) Ignoring ORDERED_EXPR, UNORDERED_EXPR, and the UN-prefixed
        comparison codes.
      
      I wrote a patch for this which tracks the NaN-ness of floating-point
      values and uses this to address all of the above.
      
      However, to minimize changes in gcc 10 stage 4, here's a simpler patch
      which rejects attempts to query or add constraints on floating-point
      values, instead treating any floating-point comparison as "unknown", and
      silently dropping the constraints at edges.
      
      gcc/analyzer/ChangeLog:
      	PR analyzer/93356
      	* region-model.cc (region_model::eval_condition): In both
      	overloads, bail out immediately on floating-point types.
      	(region_model::eval_condition_without_cm): Likewise.
      	(region_model::add_constraint): Likewise.
      
      gcc/testsuite/ChangeLog:
      	PR analyzer/93356
      	* gcc.dg/analyzer/conditionals-notrans.c (test_float_selfcmp):
      	Add.
      	* gcc.dg/analyzer/conditionals-trans.c: Mark floating point
      	comparison test as failing.
      	(test_float_selfcmp): Add.
      	* gcc.dg/analyzer/data-model-1.c: Mark floating point comparison
      	tests as failing.
      	* gcc.dg/analyzer/torture/pr93356.c: New test.
      
      gcc/ChangeLog:
      	PR analyzer/93356
      	* doc/analyzer.texi (Limitations): Note that constraints on
      	floating-point values are currently ignored.
      David Malcolm committed
  3. 30 Jan, 2020 9 commits
    • Mark switch expression as used to avoid bogus warning · f9eb0973
              PR c/88660
              * c-parser.c (c_parser_switch_statement): Make sure to request
              marking the switch expr as used.
      
              PR c/88660
              * gcc.dg/pr88660.c: New test.
      Jeff Law committed
    • cgraph: Avoid creating multiple *.localalias aliases with the same name [PR93384] · 5fb07870
      The following testcase FAILs on powerpc64le-linux with assembler errors, as we
      emit a call to bar.localalias, then .set bar.localalias, bar twice and then
      another call to bar.localalias.  The problem is that bar.localalias can be created
      at various stages and e.g. ipa-pure-const can slightly adjust the original decl,
      so that the existing bar.localalias isn't considered usable (different
      flags_from_decl_or_type).  In that case, we'd create another bar.localalias, which
      clashes with the existing name.
      
      Fixed by retrying with another name if it is already present.  The various localalias
      aliases shouldn't be that many, from different partitions they would be lto_priv
      suffixed and in most cases they would already have the same type/flags/attributes.
      
      2020-01-30  Jakub Jelinek  <jakub@redhat.com>
      
      	PR lto/93384
      	* symtab.c (symtab_node::noninterposable_alias): If localalias
      	already exists, but is not usable, append numbers after it until
      	a unique name is found.  Formatting fix.
      
      	* gcc.dg/lto/pr93384_0.c: New test.
      	* gcc.dg/lto/pr93384_1.c: New file.
      Jakub Jelinek committed
    • combine: Punt on out of range rotate counts [PR93505] · 56b92750
      What happens on this testcase is with the out of bounds rotate we get:
      Trying 13 -> 16:
         13: r129:SI=r132:DI#0<-<0x20
            REG_DEAD r132:DI
         16: r123:DI=r129:SI<0
            REG_DEAD r129:SI
      Successfully matched this instruction:
      (set (reg/v:DI 123 [ <retval> ])
          (const_int 0 [0]))
      during combine.  So, perhaps we could also change simplify-rtx.c to punt
      if it is out of bounds rather than trying to optimize anything.
      Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't
      have out of bounds counts or introduce targetm.rotate_truncation_mask,
      we should truncate the argument instead of punting.
      Punting is better for backports though.
      
      2020-01-30  Jakub Jelinek  <jakub@redhat.com>
      
      	PR middle-end/93505
      	* combine.c (simplify_comparison) <case ROTATE>: Punt on out of range
      	rotate counts.
      
      	* gcc.c-torture/compile/pr93505.c: New test.
      Jakub Jelinek committed
    • c++: Fix -Wtype-limits in templates. · 4dd468a0
      When instantiating a template tsubst_copy_and_build suppresses -Wtype-limits
      warnings about e.g. == always being false because it might not always be
      false for an instantiation with other template arguments.  But we should
      warn if the operands don't depend on template arguments.
      
      	PR c++/82521
      	* pt.c (tsubst_copy_and_build) [EQ_EXPR]: Only suppress warnings if
      	the expression was dependent before substitution.
      Jason Merrill committed
    • Remove check for maximum symbol name length. · 004ac7b7
              PR fortran/87103
              * expr.c (gfc_check_conformance): Check vsnprintf for truncation.
              * iresolve.c (gfc_get_string): Likewise.
              * symbol.c (gfc_new_symbol): Remove check for maximum symbol
              name length.  Remove redundant 0 setting of new calloc()ed
              gfc_symbol.
      Andrew Benson committed
    • Add LTGT operator support for amdgcn · 59e6d62b
      Fixes ICE in testcase gcc.dg/pr81228.c
      
      2020-01-30  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* config/gcn/gcn.c (print_operand): Handle LTGT.
      	* config/gcn/predicates.md (gcn_fp_compare_operator): Allow ltgt.
      Andrew Stubbs committed
    • Fix "regression" reported by c6x testing. · e0678350
      	* gcc.dg/tree-ssa/ssa-dse-26.c: Make existing dg-final scan
      	conditional on !c6x.  Add dg-final scan pattern for c6x.
      Jeff Law committed
    • PR middle-end/92323 - bogus -Warray-bounds after unrolling despite __builtin_unreachable · 97b40c39
      gcc/testsuite/ChangeLog:
      	* gcc.dg/Warray-bounds-57.c: New test.
      Martin Sebor committed
    • dump CTORs properly wrapped with _Literal with -gimple · bba18325
      This wraps { ... } in _Literal (type) for consumption by the GIMPLE FE.
      
      2020-01-30  Richard Biener  <rguenther@suse.de>
      
      	* tree-pretty-print.c (dump_generic_node): Wrap VECTOR_CST
      	and CONSTRUCTOR in _Literal (type) with TDF_GIMPLE.
      Richard Biener committed