1. 09 Mar, 2020 5 commits
    • alias: Punt after walking too many VALUEs during a toplevel find_base_term call [PR94045] · 2e94d3ee
      As mentioned in the PR, on a largish C++ testcase the compile time
      on i686-linux is about 16 minutes on a fast box, mostly spent in
      find_base_term recursive calls dealing with very deep chains of preserved
      VALUEs during var-tracking.
      
      The following patch punts after we process many VALUEs (we already have code
      to punt if we run into a VALUE cycle).
      
      I've gathered statistics on when we punt this way (with BITS_PER_WORD, TU,
      function columns piped through sort | uniq -c | sort -n):
           36 32 ../../gcc/asan.c _Z29initialize_sanitizer_builtinsv.part.0
          108 32 _first_test.go reflect_test.reflect_test..import
         1005 32 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr85180.c foo
         1005 32 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr87985.c foo
         1005 64 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr85180.c foo
         1005 64 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr87985.c foo
         2534 32 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/stack-check-9.c f3
         6346 32 ../../gcc/brig/brig-lang.c brig_define_builtins
         6398 32 ../../gcc/d/d-builtins.cc d_define_builtins
         8816 32 ../../gcc/c-family/c-common.c c_common_nodes_and_builtins
         8824 32 ../../gcc/lto/lto-lang.c lto_define_builtins
        41413 32 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr43058.c test
      Additionally, for most of these (for the builtins definitions tested just
      one) I've verified with a different alias.c change which didn't punt but
      in the toplevel find_base_term recorded if visited_vals reached the limit
      whether the return value was NULL_RTX or something different, and in all
      these cases the end result was NULL_RTX, so at least in these cases it
      should just shorten the time until it returns NULL.
      
      2020-03-09  Jakub Jelinek  <jakub@redhat.com>
      
      	PR rtl-optimization/94045
      	* params.opt (-param=max-find-base-term-values=): New option.
      	* alias.c (find_base_term): Add cut-off for number of visited VALUEs
      	in a single toplevel find_base_term call.
      Jakub Jelinek committed
    • Insert default return_void at the end of coroutine body · 016d0f9e
      Exception in coroutine is not correctly handled because the default
      return_void call is now inserted before the finish suspend point,
      rather than at the end of the original coroutine body.  This patch
      fixes the issue by expanding code as following:
        co_await promise.initial_suspend();
        try {
          // The original coroutine body
      
          promise.return_void(); // The default return_void call.
        } catch (...) {
          promise.unhandled_exception();
        }
        final_suspend:
        // ...
      
      gcc/cp/
          * coroutines.cc (build_actor_fn): Factor out code inserting the
          default return_void call to...
          (morph_fn_to_coro): ...here, also hoist local var declarations.
      
      gcc/testsuite/
          * g++.dg/coroutines/torture/co-ret-15-default-return_void.C: New.
      Bin Cheng committed
    • [testsuite] Fix PR94019 to check vector char when vect_hw_misalign · cb2c6020
      As PR94019 shows, without misaligned vector access support but with
      realign load, the vectorized loop will end up with realign scheme.
      It generates mask (control vector) with return type vector signed
      char which breaks the not check.
      
      gcc/testsuite/ChangeLog
      
      2020-03-09  Kewen Lin  <linkw@gcc.gnu.org>
      
          PR testsuite/94019
          * gcc.dg/vect/vect-over-widen-17.c: Don't expect vector char if
          it's without misaligned vector access support.
      Kewen Lin committed
    • [testsuite] Fix PR94023 to guard case under vect_hw_misalign · d5114529
      As PR94023 shows, the expected SLP requires misaligned vector access
      support.  This patch is to guard the check under the target condition
      vect_hw_misalign to ensure that.
      
      gcc/testsuite/ChangeLog
      
      2020-03-09  Kewen Lin  <linkw@gcc.gnu.org>
      
          PR testsuite/94023
          * gcc.dg/vect/slp-perm-12.c: Expect loop vectorized messages only on
          vect_hw_misalign targets.
      Kewen Lin committed
    • Daily bump. · 0b4ee25b
      GCC Administrator committed
  2. 08 Mar, 2020 4 commits
  3. 07 Mar, 2020 1 commit
  4. 06 Mar, 2020 25 commits
    • analyzer: improvements to region_model::get_representative_tree · 90f7c300
      This patch extends region_model::get_representative_tree so that dumps
      are able to refer to string literals, which I've found useful in
      investigating a state-bloat issue.
      
      Doing so uncovered a bug in the handling of views I introduced in
      r10-7024-ge516294a where the code was
      erroneously using TREE_TYPE on the view region's type, rather than just
      using its type, which the patch also fixes.
      
      gcc/analyzer/ChangeLog:
      	* analyzer.h (class array_region): New forward decl.
      	* program-state.cc (selftest::test_program_state_dumping_2): New.
      	(selftest::analyzer_program_state_cc_tests): Call it.
      	* region-model.cc (array_region::constant_from_key): New.
      	(region_model::get_representative_tree): Handle region_svalue by
      	generating an ADDR_EXPR.
      	(region_model::get_representative_path_var): In view handling,
      	remove erroneous TREE_TYPE when determining the type of the tree.
      	Handle array regions and STRING_CST.
      	(selftest::assert_dump_tree_eq): New.
      	(ASSERT_DUMP_TREE_EQ): New macro.
      	(selftest::test_get_representative_tree): New selftest.
      	(selftest::analyzer_region_model_cc_tests): Call it.
      	* region-model.h (region::dyn_cast_array_region): New vfunc.
      	(array_region::dyn_cast_array_region): New vfunc implementation.
      	(array_region::constant_from_key): New decl.
      
      gcc/testsuite/ChangeLog:
      	* gcc.dg/analyzer/malloc-4.c: Update expected output of leak to
      	reflect fix to region_model::get_representative_path_var, adding
      	the missing "*" from the cast.
      David Malcolm committed
    • analyzer: improvements to state dumping · 41f99ba6
      This patch fixes a bug in which summarized state dumps involving a
      non-NULL pointer to a region for which get_representative_path_var
      returned NULL were erroneously dumped as "NULL".
      
      It also extends sm-state dumps so that they show representative tree
      values, where available.
      
      Finally, it adds some selftest coverage for such dumps.  Doing so
      requires replacing some %qE with a dump_quoted_tree, to avoid
      C vs C++ differences between "make selftest-c" and "make selftest-c++".
      
      gcc/analyzer/ChangeLog:
      	* analyzer.h (dump_quoted_tree): New decl.
      	* engine.cc (exploded_node::dump_dot): Pass region model to
      	sm_state_map::print.
      	* program-state.cc: Include diagnostic-core.h.
      	(sm_state_map::print): Add "model" param and use it to print
      	representative trees.  Only print origin information if non-null.
      	(sm_state_map::dump): Pass NULL for model to print call.
      	(program_state::print): Pass region model to sm_state_map::print.
      	(program_state::dump_to_pp): Use spaces rather than newlines when
      	summarizing.  Pass region_model to sm_state_map::print.
      	(ana::selftest::assert_dump_eq): New function.
      	(ASSERT_DUMP_EQ): New macro.
      	(ana::selftest::test_program_state_dumping): New function.
      	(ana::selftest::analyzer_program_state_cc_tests): Call it.
      	* program-state.h (program_state::print): Add model param.
      	* region-model.cc (dump_quoted_tree): New function.
      	(map_region::print_fields): Use dump_quoted_tree rather than
      	%qE to avoid lang-dependent output.
      	(map_region::dump_child_label): Likewise.
      	(region_model::dump_summary_of_map): For SK_REGION, when
      	get_representative_path_var fails, print the region id rather than
      	erroneously printing NULL.
      	* sm.cc (state_machine::get_state_by_name): New function.
      	* sm.h (state_machine::get_state_by_name): New decl.
      David Malcolm committed
    • Fix mangling ICE [PR94027] · 191bcd0f
      	PR c++/94027
      	* mangle.c (find_substitution): Don't call same_type_p on template
      	args that cannot match.
      
      Now same_type_p rejects argument packs, we need to be more careful
      calling it with template argument vector contents.
      
      The mangler needs to do some comparisons to find the special
      substitutions.  While that code looks a little ugly, this seems the
      smallest fix.
      Nathan Sidwell committed
    • [AArch64] Use intrinsics for widening multiplies (PR91598) · 0b839322
      Inline assembler instructions don't have latency info and the scheduler does
      not attempt to schedule them at all - it does not even honor latencies of
      asm source operands.  As a result, SIMD intrinsics which are implemented using
      inline assembler perform very poorly, particularly on in-order cores.
      Add new patterns and intrinsics for widening multiplies, which results in a
      63% speedup for the example in the PR, thus fixing the reported regression.
      
          gcc/
      	PR target/91598
      	* config/aarch64/aarch64-builtins.c (TYPES_TERNOPU_LANE): Add define.
      	* config/aarch64/aarch64-simd.md
      	(aarch64_vec_<su>mult_lane<Qlane>): Add new insn for widening lane mul.
      	(aarch64_vec_<su>mlal_lane<Qlane>): Likewise.
      	* config/aarch64/aarch64-simd-builtins.def: Add intrinsics.
      	* config/aarch64/arm_neon.h:
      	(vmlal_lane_s16): Expand using intrinsics rather than inline asm.
      	(vmlal_lane_u16): Likewise.
      	(vmlal_lane_s32): Likewise.
      	(vmlal_lane_u32): Likewise.
      	(vmlal_laneq_s16): Likewise.
      	(vmlal_laneq_u16): Likewise.
      	(vmlal_laneq_s32): Likewise.
      	(vmlal_laneq_u32): Likewise.
      	(vmull_lane_s16): Likewise.
      	(vmull_lane_u16): Likewise.
      	(vmull_lane_s32): Likewise.
      	(vmull_lane_u32): Likewise.
      	(vmull_laneq_s16): Likewise.
      	(vmull_laneq_u16): Likewise.
      	(vmull_laneq_s32): Likewise.
      	(vmull_laneq_u32): Likewise.
      	* config/aarch64/iterators.md (Vcondtype): New iterator for lane mul.
      	(Qlane): Likewise.
      Wilco Dijkstra committed
    • [AArch64] Fix lane specifier syntax · 3e5c062e
      The syntax for lane specifiers uses a vector element rather than a vector:
      
      fmls    v0.2s, v1.2s, v1.s[1]  // rather than v1.2s[1]
      
      Fix all the lane specifiers to use Vetype which uses the correct element type.
      
          gcc/
      	* aarch64/aarch64-simd.md (aarch64_mla_elt<mode>): Correct lane syntax.
      	(aarch64_mla_elt_<vswap_width_name><mode>): Likewise.
      	(aarch64_mls_elt<mode>): Likewise.
      	(aarch64_mls_elt_<vswap_width_name><mode>): Likewise.
      	(aarch64_fma4_elt<mode>): Likewise.
      	(aarch64_fma4_elt_<vswap_width_name><mode>): Likewise.
      	(aarch64_fma4_elt_to_64v2df): Likewise.
      	(aarch64_fnma4_elt<mode>): Likewise.
      	(aarch64_fnma4_elt_<vswap_width_name><mode>): Likewise.
      	(aarch64_fnma4_elt_to_64v2df): Likewise.
      
          testsuite/
      	* gcc.target/aarch64/fmla_intrinsic_1.c: Check for correct lane syntax.
      	* gcc.target/aarch64/fmls_intrinsic_1.c: Likewise.
      	* gcc.target/aarch64/mla_intrinsic_1.c: Likewise.
      	* gcc.target/aarch64/mls_intrinsic_1.c: Likewise.
      Wilco Dijkstra committed
    • [AArch64][SVE] Add missing movprfx attribute to some ternary arithmetic patterns · 4a5c938b
      The two affected SVE2 patterns in this patch output a movprfx'ed instruction in their second alternative
      but don't set the "movprfx" attribute, which will result in the wrong instruction length being assumed by the midend.
      
      This patch fixes that in the same way as the other SVE patterns in the backend.
      
      Bootstrapped and tested on aarch64-none-linux-gnu.
      
      2020-03-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
      
      	* config/aarch64/aarch64-sve2.md (@aarch64_sve_<sve_int_op><mode>:
      	Specify movprfx attribute.
      	(@aarch64_sve_<sve_int_op>_lane_<mode>): Likewise.
      Kyrylo Tkachov committed
    • rs6000: Correct logic to disable NO_SUM_IN_TOC and NO_FP_IN_TOC [PR94065] · 3dcf51ad
      aix61.h, aix71.h and aix72.h intends to prevent SUM_IN_TOC and FP_IN_TOC
      when cmodel=large.  This patch defines the variables associated with the
      target options to 1 to _enable_ NO_SUM_IN_TOC and enable NO_FP_IN_TOC.
      
      Bootstrapped on powerpc-ibm-aix7.2.0.0
      
      	2020-03-06  David Edelsohn  <dje.gcc@gmail.com>
      	PR target/94065
      	* config/rs6000/aix61.h (TARGET_NO_SUM_IN_TOC): Set to 1 for
      	cmodel=large.
      	(TARGET_NO_FP_IN_TOC): Same.
      	* config/rs6000/aix71.h: Same.
      	* config/rs6000/aix72.h: Same.
      David Edelsohn committed
    • Avoid putting a REG_NOTE on anything other than an INSN in haifa-sched.c · e6ce69ca
      	PR rtl-optimization/93996
      	* haifa-sched.c (remove_notes): Be more careful when adding
      	REG_SAVE_NOTE.
      Andrew Pinski committed
    • arc: Update tumaddsidi4 test. · 4b62b396
      The test is using -O1 and, the macu instruction is generated by the
      combiner and not in the expand step. My previous "arc: Improve code
      gen for 64bit add/sub operations." is actually splitting the 64-bit
      add in the expand, leading to the impossibility to match the multiply
      and accumulate on 64 bit datum by the combiner, hence, the error. This
      patch is stepping up the optimization level which will generate the
      macu instruction at the expand time.
      
      xxxx-xx-xx  Claudiu Zissulescu  <claziss@synopsys.com>
      
      	* gcc.target/arc/tumaddsidi4.c: Step-up optimization level.
      
      Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
      Claudiu Zissulescu committed
    • libstdc++: Add missing friend declaration to join_view::_Sentinel · 6aa2ca21
      The converting constructor of join_view::_Sentinel<true> needs to be able to
      access the private members of join_view::_Sentinel<false>.
      
      libstdc++-v3/ChangeLog:
      
      	* include/std/ranges (join_view::_Sentinel<_Const>): Befriend
      	join_view::_Sentinel<!_Const>.
      	* testsuite/std/ranges/adaptors/join.cc: Augment test.
      Patrick Palka committed
    • libstdc++: Give ranges::empty() a concrete return type (PR 93978) · 6d082cd9
      This works around PR 93978 by avoiding having to instantiate the body of
      ranges::empty() when checking the constraints of view_interface::operator
      bool().  When ranges::empty() has an auto return type, then we must instantiate
      its body in order to determine whether the requires expression {
      ranges::empty(_M_derived()); } is well-formed.  But this means instantiating
      view_interface::empty() and hence view_interface::_M_derived(), all before we've
      yet deduced the return type of join_view::end().  (The reason
      view_interface::operator bool() is needed in join_view::end() in the first place
      is because in this function we perform direct initialization of
      join_view::_Sentinel from a join_view, and so we try to find a conversion
      sequence from the latter to the former that goes through this conversion
      operator.)
      
      Giving ranges::empty() a concrete return type of bool should be safe according
      to [range.prim.empty]/4 which says "whenever ranges::empty(E) is a valid
      expression, it has type bool."
      
      This fixes the test case in PR 93978 when compiling without -Wall, but with -Wall
      the test case still fails due to the issue described in PR c++/94038, I think.
      I still don't quite understand why the test case doesn't fail without -O.
      
      libstdc++-v3/ChangeLog:
      
      	PR libstdc++/93978
      	* include/bits/range_access.h (__cust_access::_Empty::operator()):
      	Declare return type to be bool instead of auto.
      	* testsuite/std/ranges/adaptors/93978.cc: New test.
      Patrick Palka committed
    • libstdc++: Fix call to __glibcxx_rwlock_init (PR 93244) · b0815713
      When the target doesn't define PTHREAD_RWLOCK_INITIALIZER we use a
      wrapper around pthread_wrlock_init, but the wrapper only takes one
      argument and we try to call it with two.
      
      This went unnnoticed on most targets because they do define the
      PTHREAD_RWLOCK_INITIALIZER macro, but it causes a bootstrap failure on
      darwin8.
      
      	PR libstdc++/93244
      	* include/std/shared_mutex [!PTHREAD_RWLOCK_INITIALIZER]
      	(__shared_mutex_pthread::__shared_mutex_pthread()): Remove incorrect
      	second argument to __glibcxx_rwlock_init.
      	* testsuite/30_threads/shared_timed_mutex/94069.cc: New test.
      Jonathan Wakely committed
    • libstdc++: Fix failing filesystem::path tests (PR 93244) · 180eeeae
      The checks for PR 93244 don't actually pass on Windows (which is the
      target where the bug is present) because of a different bug, PR 94063.
      
      This adjusts the tests to not be affected by 94063 so that they verify
      that 93244 was fixed.
      
      	PR libstdc++/93244
      	* testsuite/27_io/filesystem/path/generic/generic_string.cc: Adjust
      	test to not fail due to PR 94063.
      	* testsuite/27_io/filesystem/path/generic/utf.cc: Likewise.
      	* testsuite/27_io/filesystem/path/generic/wchar_t.cc: Likewise.
      Jonathan Wakely committed
    • libstdc++: Deal with ENOSYS == ENOTSUP · 28119fba
      zTPF uses the same numeric value for ENOSYS and ENOTSUP.
      
      libstdc++-v3/ChangeLog:
      
      2020-03-06  Andreas Krebbel  <krebbel@linux.ibm.com>
      
      	* src/c++11/system_error.cc: Omit the ENOTSUP case statement if it
      	would match ENOSYS.
      Andreas Krebbel committed
    • ACLE intrinsics: BFloat16 load intrinsics for AArch32 · eb637e76
      2020-03-06  Delia Burduv  <delia.burduv@arm.com>
      
      	* config/arm/arm_neon.h (vld2_bf16): New.
      	(vld2q_bf16): New.
      	(vld3_bf16): New.
      	(vld3q_bf16): New.
      	(vld4_bf16): New.
      	(vld4q_bf16): New.
      	(vld2_dup_bf16): New.
      	(vld2q_dup_bf16): New.
      	(vld3_dup_bf16): New.
      	(vld3q_dup_bf16): New.
      	(vld4_dup_bf16): New.
      	(vld4q_dup_bf16): New.
      	* config/arm/arm_neon_builtins.def
      	(vld2): Changed to VAR13 and added v4bf, v8bf
      	(vld2_dup): Changed to VAR8 and added v4bf, v8bf
      	(vld3): Changed to VAR13 and added v4bf, v8bf
      	(vld3_dup): Changed to VAR8 and added v4bf, v8bf
      	(vld4): Changed to VAR13 and added v4bf, v8bf
      	(vld4_dup): Changed to VAR8 and added v4bf, v8bf
      	* config/arm/iterators.md (VDXBF2): New iterator.
      	*config/arm/neon.md (neon_vld2): Use new iterators.
      	(neon_vld2_dup<mode): Use new iterators.
      	(neon_vld3<mode>): Likewise.
      	(neon_vld3qa<mode>): Likewise.
      	(neon_vld3qb<mode>): Likewise.
      	(neon_vld3_dup<mode>): Likewise.
      	(neon_vld4<mode>): Likewise.
      	(neon_vld4qa<mode>): Likewise.
      	(neon_vld4qb<mode>): Likewise.
      	(neon_vld4_dup<mode>): Likewise.
      	(neon_vld2_dupv8bf): New.
      	(neon_vld3_dupv8bf): Likewise.
      	(neon_vld4_dupv8bf): Likewise.
      
      	* gcc.target/arm/simd/bf16_vldn_1.c: New test.
      Delia Burduv committed
    • ACLE intrinsics: BFloat16 store (vst<n>{q}_bf16) intrinsics for AArch32 · ff229375
      2020-03-06  Delia Burduv  <delia.burduv@arm.com>
      
      	* config/arm/arm_neon.h (bfloat16x4x2_t): New typedef.
      	(bfloat16x8x2_t): New typedef.
      	(bfloat16x4x3_t): New typedef.
      	(bfloat16x8x3_t): New typedef.
      	(bfloat16x4x4_t): New typedef.
      	(bfloat16x8x4_t): New typedef.
      	(vst2_bf16): New.
      	(vst2q_bf16): New.
      	(vst3_bf16): New.
      	(vst3q_bf16): New.
      	(vst4_bf16): New.
      	(vst4q_bf16): New.
      	* config/arm/arm-builtins.c (v2bf_UP): Define.
      	(VAR13): New.
      	(arm_init_simd_builtin_types): Init Bfloat16x2_t eltype.
      	* config/arm/arm-modes.def (V2BF): New mode.
      	* config/arm/arm-simd-builtin-types.def
      	(Bfloat16x2_t): New entry.
      	* config/arm/arm_neon_builtins.def
      	(vst2): Changed to VAR13 and added v4bf, v8bf
      	(vst3): Changed to VAR13 and added v4bf, v8bf
      	(vst4): Changed to VAR13 and added v4bf, v8bf
      	* config/arm/iterators.md (VDXBF): New iterator.
      	(VQ2BF): New iterator.
      	*config/arm/neon.md (neon_vst2<mode>): Used new iterators.
      	(neon_vst2<mode>): Used new iterators.
      	(neon_vst3<mode>): Used new iterators.
      	(neon_vst3<mode>): Used new iterators.
      	(neon_vst3qa<mode>): Used new iterators.
      	(neon_vst3qb<mode>): Used new iterators.
      	(neon_vst4<mode>): Used new iterators.
      	(neon_vst4<mode>): Used new iterators.
      	(neon_vst4qa<mode>): Used new iterators.
      	(neon_vst4qb<mode>): Used new iterators.
      
      	* gcc.target/arm/simd/bf16_vstn_1.c: New test.
      Delia Burduv committed
    • aarch64: ACLE intrinsics for BFCVTN, BFCVTN2 and BFCVT · 1f520d34
      This patch adds the Armv8.6-a ACLE intrinsics for bfcvtn, bfcvtn2 and
      bfcvt as part of the BFloat16 extension.
      (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
      The intrinsics are declared in arm_bf16.h and arm_neon.h and the RTL
      patterns are defined in aarch64-simd.md.
      
      2020-03-06  Delia Burduv  <delia.burduv@arm.com>
      
      gcc/
      	* config/aarch64/aarch64-simd-builtins.def
      	(bfcvtn): New built-in function.
      	(bfcvtn_q): New built-in function.
      	(bfcvtn2): New built-in function.
      	(bfcvt): New built-in function.
      	* config/aarch64/aarch64-simd.md
      	(aarch64_bfcvtn<q><mode>): New pattern.
      	(aarch64_bfcvtn2v8bf): New pattern.
      	(aarch64_bfcvtbf): New pattern.
      	* config/aarch64/arm_bf16.h (float32_t): New typedef.
      	(vcvth_bf16_f32): New intrinsic.
      	* config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
      	(vcvtq_low_bf16_f32): New intrinsic.
      	(vcvtq_high_bf16_f32): New intrinsic.
      	* config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
      	(UNSPEC_BFCVTN): New UNSPEC.
      	(UNSPEC_BFCVTN2): New UNSPEC.
      	(UNSPEC_BFCVT): New UNSPEC.
      	* config/arm/types.md (bf_cvt): New type.
      
      gcc/testsuite/
      	* gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c: New test.
      	* gcc.target/aarch64/advsimd-intrinsics/bfcvt-nobf16.c: New test.
      	* gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c: New test.
      	* gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: New test.
      Delia Burduv committed
    • Fix error format string. · 655e5c29
      gcc/ChangeLog:
      
      2020-03-06  Andreas Krebbel  <krebbel@linux.ibm.com>
      
      	* config/s390/s390.md ("tabort"): Get rid of two consecutive
      	blanks in format string.
      Andreas Krebbel committed
    • re PR tree-optimization/90883 (Generated code is worse if returned struct is unnamed) · 46275300
      After add --param max-inline-insns-size=1 all target will remove the
      redundant store at dse1, except some targets like AArch64 and MIPS will
      expand the struct initialization into loop due to CLEAR_RATIO.
      
      Tested on cross compiler of riscv32, riscv64, x86, x86_64, mips, mips64,
      aarch64, nds32 and arm.
      
      gcc/testsuite/ChangeLog
      
      	PR tree-optimization/90883
      	* g++.dg/tree-ssa/pr90883.c: Add --param max-inline-insns-size=1.
      	Add aarch64-*-* mips*-*-* to XFAIL.
      Kito Cheng committed
    • i386: Properly encode vector registers in vector move · 5358e8f5
      On x86, when AVX and AVX512 are enabled, vector move instructions can
      be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
      
         0:	c5 f9 6f d1          	vmovdqa %xmm1,%xmm2
         4:	62 f1 fd 08 6f d1    	vmovdqa64 %xmm1,%xmm2
      
      We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
      only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
      and 256-bit vector moves.  xmm16-xmm31 and ymm16-ymm31 are disallowed in
      128-bit and 256-bit modes when AVX512VL is disabled.  Mode attributes on
      x86 vector move patterns indicate target preferences of vector move
      encoding.  For scalar register to register move, we can use 512-bit
      vector move instructions to move 32-bit/64-bit scalar if AVX512VL isn't
      available.  With AVX512F and AVX512VL, we should use VEX encoding for
      128-bit/256-bit vector moves if upper 16 vector registers aren't used.
      This patch adds a function, ix86_output_ssemov, to generate vector moves:
      
      1. If zmm registers are used, use EVEX encoding.
      2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
      will be generated.
      3. If xmm16-xmm31/ymm16-ymm31 registers are used:
         a. With AVX512VL, AVX512VL vector moves will be generated.
         b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
            move will be done with zmm register move.
      
      There is no need to set mode attribute to XImode explicitly since
      ix86_output_ssemov can properly encode xmm16-xmm31/ymm16-ymm31 registers
      with and without AVX512VL.
      
      Tested on AVX2 and AVX512 with and without --with-arch=native.
      
      gcc/
      
      	PR target/89229
      	PR target/89346
      	* config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
      	* config/i386/i386.c (ix86_get_ssemov): New function.
      	(ix86_output_ssemov): Likewise.
      	* config/i386/sse.md (VMOVE:mov<mode>_internal): Call
      	ix86_output_ssemov for TYPE_SSEMOV.  Remove TARGET_AVX512VL
      	check.
      	(*movxi_internal_avx512f): Call ix86_output_ssemov for TYPE_SSEMOV.
      	(*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
      	Remove ext_sse_reg_operand and TARGET_AVX512VL check.
      	(*movti_internal): Likewise.
      	(*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
      
      gcc/testsuite/
      
      	PR target/89229
      	PR target/89346
      	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated.
      	* gcc.target/i386/pr89229-2a.c: New test.
      	* gcc.target/i386/pr89229-2b.c: Likewise.
      	* gcc.target/i386/pr89229-2c.c: Likewise.
      	* gcc.target/i386/pr89229-3a.c: Likewise.
      	* gcc.target/i386/pr89229-3b.c: Likewise.
      	* gcc.target/i386/pr89229-3c.c: Likewise.
      	* gcc.target/i386/pr89346.c: Likewise.
      H.J. Lu committed
    • Daily bump. · 34ec7d53
      GCC Administrator committed
    • [PATCH][testuite] Fix pr80481.C after epilogue vectorization · 22a75da9
              * g++.dg/pr80481.C: Disable epilogue vectorization.
      Andre Vieira committed
  5. 05 Mar, 2020 5 commits
    • c: ignore initializers for elements of variable-size types [PR93577] · c9d70946
      Bug 93577, apparently a regression (although it isn't very clear to me
      exactly when it was introduced; tests I made with various past
      compilers produced inconclusive results, including e.g. ICEs appearing
      with 64-bit-host compilers for some versions but not 32-bit-host
      compilers for the same versions) is an C front-end tree-checking ICE
      processing initializers for structs using the VLA-in-struct extension.
      There is an error for such initializers, but other processing that
      still takes place for them results in the ICE.
      
      This patch ensures that processing of initializers for variable-size
      types stops earlier to avoid the code that results in the ICE (and
      ensures it stops earlier for error_mark_node to avoid ICEs in the
      check for variable-size types), adjusts the conditions for the "empty
      scalar initializer" diagnostic to avoid consequent excess errors in
      the case of a bad type name, and adds tests for a few variations on
      what such initializers might look like, as well as tests for cases
      identified from ICEs seen with an earlier version of this patch.
      
      Bootstrapped with no regressions for x86_64-pc-linux-gnu.
      
      	PR c/93577
      gcc/c:
      	* c-typeck.c (pop_init_level): Do not diagnose initializers as
      	empty when initialized type is error_mark_node.
      	(set_designator, process_init_element): Ignore initializers for
      	elements of a variable-size type or of error_mark_node.
      
      gcc/testsuite:
      	* gcc.dg/pr93577-1.c, gcc.dg/pr93577-2.c, gcc.dg/pr93577-3.c,
      	gcc.dg/pr93577-4.c, gcc.dg/pr93577-5.c, gcc.dg/pr93577-6.c: New
      	tests.
      	* gcc.dg/vla-init-1.c: Expect fewer errors about VLA initializer.
      Joseph Myers committed
    • Fix location maybe_diag_overlap passes to diagnostics so that diagnostic pragmas work better. · 55ace4d1
      	PR tree-optimization/91890
      	* gimple-ssa-warn-restrict.c (maybe_diag_overlap): Remove LOC argument.
      	Use gimple_or_expr_nonartificial_location.
      	(check_bounds_overlap): Drop LOC argument to maybe_diag_access_bounds.
      	Use gimple_or_expr_nonartificial_location.
      	* gimple.c (gimple_or_expr_nonartificial_location): New function.
      	* gimple.h (gimple_or_expr_nonartificial_location): Declare it.
      	* tree-ssa-strlen.c (maybe_warn_overflow): Use
      	gimple_or_expr_nonartificial_location.
      	(maybe_diag_stxncpy_trunc, handle_builtin_stxncpy_strncat): Likewise.
      	(maybe_warn_pointless_strcmp): Likewise.
      
      	* gcc.dg/pragma-diag-8.c: New test.
      Jeff Law committed
    • i386: Fix some -O0 avx2intrin.h and xopintrin.h intrinsic macros [PR94046] · 3a0e583b
      As the testcases show, the macros we have for -O0 for intrinsics that require
      constant argument(s) should first cast the argument to the type the -O1+
      inline uses and afterwards to whatever type e.g. a builtin needs.
      The PR reported one which violated this, and I've grepped for all double-casts
      and grepped out from that meaningful casts where the __m{128,256,512}{,d,i}
      first cast is cast to same sized __v* type and has the same kind of element
      type (float, double, integral).  These 7 macros were using different casts,
      and I've double checked them against the inline function types.
      
      2020-03-05  Jakub Jelinek  <jakub@redhat.com>
      
      	PR target/94046
      	* config/i386/avx2intrin.h (_mm_mask_i32gather_ps): Fix first cast of
      	SRC and MASK arguments to __m128 from __m128d.
      	(_mm256_mask_i32gather_ps): Fix first cast of MASK argument to __m256
      	from __m256d.
      	(_mm_mask_i64gather_ps): Fix first cast of MASK argument to __m128
      	from __m128d.
      	* config/i386/xopintrin.h (_mm_permute2_pd): Fix first cast of C
      	argument to __m128i from __m128d.
      	(_mm256_permute2_pd): Fix first cast of C argument to __m256i from
      	__m256d.
      	(_mm_permute2_ps): Fix first cast of C argument to __m128i from __m128.
      	(_mm256_permute2_ps): Fix first cast of C argument to __m256i from
      	__m256.
      
      	* g++.target/i386/pr94046-1.C: New test.
      	* g++.target/i386/pr94046-2.C: New test.
      Jakub Jelinek committed
    • [AArch32] ACLE intrinsics bfloat16 vmmla and vfma<b/t> for AArch32 AdvSIMD · 2d22ab64
      Commit rest of the 43031fbd content.
      I screwed up on the "git add" commands there.
      Kyrylo Tkachov committed