1. 24 Feb, 2020 8 commits
    • libstdc++: Add default_sentinel support to stream iterators · 120e8734
      Missing pieces of P0896R4 "The One Ranges Proposal" for C++20.
      
      	* include/bits/stream_iterator.h (istream_iterator(default_sentinel_t)):
      	Add constructor.
      	(operator==(istream_iterator, default_sentinel_t)): Add operator.
      	(ostream_iterator::difference_type): Define to ptrdiff_t for C++20.
      	* include/bits/streambuf_iterator.h
      	(istreambuf_iterator(default_sentinel_t)): Add constructor.
      	(operator==(istreambuf_iterator, default_sentinel_t)): Add operator.
      	* testsuite/24_iterators/istream_iterator/cons/sentinel.cc:
      	New test.
      	* testsuite/24_iterators/istream_iterator/sentinel.cc: New test.
      	* testsuite/24_iterators/istreambuf_iterator/cons/sentinel.cc:
      	New test.
      	* testsuite/24_iterators/istreambuf_iterator/sentinel.cc: New test.
      Jonathan Wakely committed
    • PR78353: Fix testcases · e03069be
      Skip the test if arm7a is not supported at link time. This is the case
      if the toolchain targets an M-profile CPU by default and does not have
      A-profile multilib: the link step fails because it tries to mix
      M-profile startup files with A-profile testcase.
      
      2020-02-24  Christophe Lyon  <christophe.lyon@linaro.org>
      
      	PR lto/78353
      	* gcc.target/arm/pr78353-1.c: Add arm_arch_v7a_multilib effective
      	target.
      	* gcc.target/arm/pr78353-2.c: Likewise.
      Christophe Lyon committed
    • libstdc++: enable_view has false positives (LWG 3326) · 3841739c
      	* include/std/ranges (__deep_const_range, __enable_view_impl): Remove.
      	(ranges::enable_view): Simplify (LWG 3326).
      	* include/bits/range_access.h (ranges::enable_view): Declare.
      	* include/bits/regex.h (__enable_view_impl): Remove partial
      	specialization.
      	* include/bits/stl_multiset.h (__enable_view_impl): Likewise.
      	* include/bits/stl_set.h (__enable_view_impl): Likewise.
      	* include/bits/unordered_set.h (__enable_view_impl): Likewise.
      	* include/debug/multiset.h (__enable_view_impl): Likewise.
      	* include/debug/set.h (__enable_view_impl): Likewise.
      	* include/debug/unordered_set (__enable_view_impl): Likewise.
      	* include/experimental/string_view (ranges::enable_view): Define
      	partial specialization.
      	* include/std/span (ranges::enable_view): Likewise.
      	* include/std/string_view (ranges::enable_view): Likewise.
      	* testsuite/std/ranges/view.cc: Check satisfaction of updated concept.
      Jonathan Wakely committed
    • sccvn: Handle bitfields in push_partial_def [PR93582] · 7f5617b0
      The following patch adds support for bitfields to push_partial_def.
      Previously pd.offset and pd.size were counted in bytes and maxsizei
      in bits, now everything is counted in bits.
      
      Not really sure how much of the further code can be outlined and moved, e.g.
      the full def and partial def code doesn't have pretty much anything in
      common (the partial defs case basically have some load bit range and a set
      of store bit ranges that at least partially overlap and we need to handle
      all the different cases, like negative pd.offset or non-negative, little vs.
      bit endian, size so small that we need to preserve original bits on both
      sides of the byte, size that fits or is too large.
      Perhaps the storing of some value into a middle of existing buffer (i.e.
      what push_partial_def now does in the loop) could, but the candidate for
      sharing would be most likely store-merging rather than the other spots in
      sccvn, and I think it is better not to touch store-merging at this stage.
      
      Yes, I've thought about trying to do everything in place, but the code is
      quite hard to understand and get right already now and if we tried to do the
      optimize on the fly, it would need more special cases and would for gcov
      coverage need more testcases to cover it.  Most of the time the sizes will
      be small.  Furthermore, for bitfields native_encode_expr stores actually
      number of bytes in the mode and not say actual bitsize rounded up to bytes,
      so it wouldn't be just a matter of saving/restoring bytes at the start and
      end, but we might need even 7 further bytes e.g. for __int128 bitfields.
      Perhaps we could have just a fast path for the case where everything is byte
      aligned and (for integral types the mode bitsize is equal to the size too)?
      
      2020-02-24  Jakub Jelinek  <jakub@redhat.com>
      
      	PR tree-optimization/93582
      	* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Consider
      	pd.offset and pd.size to be counted in bits rather than bytes, add
      	support for maxsizei that is not a multiple of BITS_PER_UNIT and
      	handle bitfield stores and loads.
      	(vn_reference_lookup_3): Don't call ranges_known_overlap_p with
      	uncomparable quantities - bytes vs. bits.  Allow push_partial_def
      	on offsets/sizes that aren't multiple of BITS_PER_UNIT and adjust
      	pd.offset/pd.size to be counted in bits rather than bytes.
      	Formatting fix.  Rename shadowed len variable to buflen.
      
      	* gcc.dg/tree-ssa/pr93582-4.c: New test.
      	* gcc.dg/tree-ssa/pr93582-5.c: New test.
      	* gcc.dg/tree-ssa/pr93582-6.c: New test.
      	* gcc.dg/tree-ssa/pr93582-7.c: New test.
      	* gcc.dg/tree-ssa/pr93582-8.c: New test.
      Jakub Jelinek committed
    • OpenACC tile clause – apply exit/cycle checks (PR 93552) · 2bd8c3ff
              PR fortran/93552
              * match.c (match_exit_cycle): With OpenACC, check the kernels loop
              directive and tile clause as well.
      
              PR fortran/93552
              * gfortran.dg/goacc/tile-4.f90: New.
      Tobias Burnus committed
    • PR47785: Add support for handling Xassembler/Wa options with LTO. · f1a681a1
      2020-02-24  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
      	    Kugan Vivekandarajah  <kugan.vivekanandarajah@linaro.org>
      
      	PR driver/47785
      	* gcc.c (putenv_COLLECT_AS_OPTIONS): New function.
      	(driver::main): Call putenv_COLLECT_AS_OPTIONS.
      	* opts-common.c (parse_options_from_collect_gcc_options): New function.
      	(prepend_xassembler_to_collect_as_options): Likewise.
      	* opts.h (parse_options_from_collect_gcc_options): Declare prototype.
      	(prepend_xassembler_to_collect_as_options): Likewise.
      	* lto-opts.c (lto_write_options): Stream assembler options
      	in COLLECT_AS_OPTIONS.
      	* lto-wrapper.c (xassembler_options_error): New static variable.
      	(get_options_from_collect_gcc_options): Move parsing options code to
      	parse_options_from_collect_gcc_options and call it.
      	(merge_and_complain): Validate -Xassembler options.
      	(append_compiler_options): Handle OPT_Xassembler.
      	(run_gcc): Append command line -Xassembler options to
      	collect_gcc_options.
      	* doc/invoke.texi: Add documentation about using Xassembler
      	options with LTO.
      
      testsuite/
      	* gcc.target/arm/pr78353-1.c: New test.
      	* gcc.target/arm/pr78353-2.c: Likewise.
      Prathamesh Kulkarni committed
    • RISC-V: Adjust floating point code gen for LTGT compare · 9069e948
       - Using gcc.dg/torture/pr91323.c as testcase, so no new testcase
         introduced.
      
       - We use 3 eq compare for LTGT compare before, in order to prevent exception
         flags setting when any input is NaN.
      
       - According latest GCC document LTGT and discussion on pr91323
         LTGT should signals on NaNs, like GE/GT/LE/LT.
      
       - So we expand (LTGT a b) to ((LT a b) | (GT a b)) for fit the document.
      
       - Tested rv64gc/rv32gc bare-metal/linux on qemu and
         rv64gc on HiFive unleashed board with linux.
      
      ChangeLog
      
      gcc/
      
      Kito Cheng  <kito.cheng@sifive.com>
      
      	* config/riscv/riscv.c (riscv_emit_float_compare): Change the code gen
      	for LTGT.
      	(riscv_rtx_costs): Update cost model for LTGT.
      Kito Cheng committed
    • Daily bump. · c7bfe1aa
      GCC Administrator committed
  2. 23 Feb, 2020 5 commits
  3. 22 Feb, 2020 4 commits
  4. 21 Feb, 2020 23 commits
    • Fix handling of floating-point homogeneous aggregates. · 01af7e0a
      	2020-02-21  John David Anglin  <danglin@gcc.gnu.org>
      
      	* gcc/config/pa/pa.c (pa_function_value): Fix check for word and
      	double-word size when handling aggregate return values.
      	* gcc/config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Fix to indicate
      	that homogeneous SFmode and DFmode aggregates are passed and returned
      	in general registers.
      John David Anglin committed
    • i18n: Fix translation of --help [PR93759] · 8d1780b5
      The first two hunks make sure we actually translate what has been marked
      for translation, i.e. the cl_options[...].help strings, rather than those
      strings ammended in various ways, like:
      _("%s  Same as %s."), help, ...
      or
      "%s  %s", help, _(use_diagnosed_msg)
      
      The exgettext changes attempt to make sure that the cl_options[...].help
      strings are marked as no-c-format, because otherwise if they happen
      to contain a % character, such as the 90% substring, they will be marked
      as c-format, which they aren't.
      
      2020-02-21  Jakub Jelinek  <jakub@redhat.com>
      
      	PR translation/93759
      	* opts.c (print_filtered_help): Translate help before appending
      	messages to it rather than after that.
      
      	* exgettext: For *.opt help texts, use __opt_help_text("...")
      	rather than _("...") in the $emsg file and pass options that
      	say that this implies no-c-format.
      Jakub Jelinek committed
    • lra: Stop registers being incorrectly marked live v2 [PR92989] · d11676de
      This PR is about a case in which the clobbers at the start of
      an EH receiver can lead to registers becoming unnecessarily
      live in predecessor blocks.  My first attempt at fixing this
      made sure that we update the bb liveness info based on the
      real live set:
      
        http://gcc.gnu.org/g:e648e57efca6ce6d751ef8c2038608817b514fb4
      
      But it turns out that the clobbered registers were also added to
      the "gen" set of LRA's private liveness problem, where "gen" in
      this context means "generates a requirement for a live value".
      So the clobbered registers could still end up live via that
      mechanism instead.
      
      This patch therefore reverts the patch above and takes the other
      approach floated in the original patch description: model the full
      clobber by making the registers live and then dead again.
      
      There's no specific need to revert the original patch, since the
      code should no longer be sensitive to the order of the bb liveness
      update and the modelling of the clobber.  But given that there's
      no specific need to keep the original patch either, it seemed better
      to restore the code to the more well-tested order.
      
      Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
      
      Richard
      
      2020-02-19  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	PR rtl-optimization/PR92989
      	* lra-lives.c (process_bb_lives): Restore the original order
      	of the bb liveness update.  Call make_hard_regno_dead for each
      	register clobbered at the start of an EH receiver.
      Richard Sandiford committed
    • Do not propagate self-dependent value (PR ipa/93763) (ChangeLog) · 25f0909a
                  PR ipa/93763
                  * ipa-cp.c (self_recursively_generated_p): Mark self-dependent value as
                  self-recursively generated.
      Jeff Law committed
    • Do not propagate self-dependent value (PR ipa/93763) · 47772af1
              PR ipa/93763
              * ipa-cp.c (self_recursively_generated_p): Mark self-dependent value as
              self-recursively generated.
      Feng Xue committed
    • Darwin: Fix wrong quoting on an error string (PR93860). · 147add96
      The quotes should surround all of the literal content from the pragma
      that has incorrect usage.
      
      2020-02-21  Iain Sandoe  <iain@sandoe.co.uk>
      
      PR target/93860
      * config/darwin-c.c (pop_field_alignment): Adjust quoting of
      error string.
      Iain Sandoe committed
    • PR c++/93753 - ICE on a flexible array followed by a member in an anonymous… · dbfba41e
      PR c++/93753 - ICE on a flexible array followed by a member in an anonymous struct with an initializer
      
      gcc/cp/ChangeLog:
      
      	PR gcov-profile/93753
      	* class.c (check_flexarrays): Tighten up a test for potential members
      	of anonymous structs or unions.
      
      gcc/testsuite/ChangeLog:
      
      	PR gcov-profile/93753
      	* g++.dg/ext/flexary36.C: New test.
      	* g++.dg/lto/pr93166_0.C: Make struct with flexarray valid.
      Martin Sebor committed
    • libstdc++: Define <=> for tuple, optional and variant · 9e589880
      Another piece of P1614R2.
      
      	* include/std/optional (operator<=>(optional<T>, optional<U>))
      	(operator<=>(optional<T>, nullopt), operator<=>(optional<T>, U)):
      	Define for C++20.
      	* include/std/tuple (__tuple_cmp): New helper function for <=>.
      	(operator<=>(tuple<T...>, tuple<U>...)): Define for C++20.
      	* include/std/variant (operator<=>(variant<T...>, variant<T...>))
      	(operator<=>(monostate, monostate)): Define for C++20.
      	* testsuite/20_util/optional/relops/three_way.cc: New test.
      	* testsuite/20_util/tuple/comparison_operators/three_way.cc: New test.
      	* testsuite/20_util/variant/89851.cc: Move to ...
      	* testsuite/20_util/variant/relops/89851.cc: ... here.
      	* testsuite/20_util/variant/90008.cc: Move to ...
      	* testsuite/20_util/variant/relops/90008.cc: ... here.
      	* testsuite/20_util/variant/relops/three_way.cc: New test.
      Jonathan Wakely committed
    • [PATCH, GCC/ARM] Add MVE target check to sourcebuild.texi · 131fbdd7
      Follow up to: https://gcc.gnu.org/ml/gcc-patches/2020-02/msg01109.html
      
      Committed as obvious.
      
      gcc/ChangeLog:
      
      2020-02-21  Mihail Ionescu  <mihail.ionescu@arm.com>
      
      	* doc/sourcebuild.texi (arm_v8_1m_mve_ok):
      	Document new target supports option.
      Mihail Ionescu committed
    • arm: ACLE I8MM multiply-accumulate · 436016f4
      This patch adds intrinsics for matrix multiply-accumulate instructions
      including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.
      
      gcc/ChangeLog:
      
      2020-02-21  Dennis Zhang  <dennis.zhang@arm.com>
      
      	* config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): New.
      	* config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
      	* config/arm/iterators.md (MATMUL): New iterator.
      	(sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
      	(mmla_sfx): New attribute.
      	* config/arm/neon.md (neon_<sup>mmlav16qi): New.
      	* config/arm/unspecs.md (UNSPEC_MATMUL_S, UNSPEC_MATMUL_U): New.
      	(UNSPEC_MATMUL_US): New.
      
      gcc/testsuite/ChangeLog:
      
      2020-02-21  Dennis Zhang  <dennis.zhang@arm.com>
      
      	* gcc.target/arm/simd/vmmla_1.c: New test.
      Dennis Zhang committed
    • testsuite: Add -fcommon to gcc.target/i386/pr69052.c · b59506cd
      This testcase is susceptible to memory location details and start to fail
      with default to -fno-common.  Use -fcommon to set expected testing conditions.
      
      	* gcc.target/i386/pr69052.c: Require target ia32.
      	(dg-options): Add -fcommon and remove -pie.
      Uros Bizjak committed
    • [PATCH, GCC/ARM] Fix MVE scalar shift tests · bf5582c3
      *** gcc/ChangeLog ***
      
      2020-02-21  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
      
      	* config/arm/arm.md: Prevent scalar shifts from being
      	used when big endian is enabled.
      
      *** gcc/testsuite/ChangeLog ***
      
      2020-02-21  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
      
      	* gcc.target/arm/armv8_1m-shift-imm-1.c: Add MVE target checks.
      	* gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise.
      	* lib/target-supports.exp
      	(check_effective_target_arm_v8_1m_mve_ok_nocache): New.
      	(check_effective_target_arm_v8_1m_mve_ok): New.
      	(add_options_for_v8_1m_mve): New.
      Mihail Ionescu committed
    • testsuite: Require vect_mutiple_sizes for scan-tree-dump in vect-epilogues.c · b150c838
      Default testsuite flags do not enable V8QI (MMX) vector mode for
      32bit x86 targets.  Require vect_multiple_sizes effective target in
      scan-tree-dump to avoid "LOOP EPILOGUE VECTORIZED" failure.
      
      	* gcc.dg/vect/vect-epilogues.c (scan-tree-dump): Require
      	vect_mutiple_sizes effective target.
      Uros Bizjak committed
    • Adapt libgomp acc_get_property.f90 test · 83d45e1d
      The commit r10-6721-g8d1a1cb1 has changed
      the name of the type that is used for the return value of the Fortran
      acc_get_property function without adapting the test acc_get_property.f90.
      
      2020-02-21  Frederik Harwath  <frederik@codesourcery.com>
      
      	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
      	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
      	integer(acc_device_property) for the type of the return value of
      	acc_get_property.
      Frederik Harwath committed
    • tree-optimization: fix access path oracle on mismatched array refs [PR93586] · 91e50b2a
      nonoverlapping_array_refs_p is not supposed to give meaningful results when
      bases of ref1 and ref2 are not same or completely disjoint and here it is
      called on c[0][j_2][0] and c[0][1] so bases in sence of this functions are
      "c[0][j_2]" and "c[0]" which do partially overlap.  nonoverlapping_array_refs
      however walks pair of array references and in this case it misses to note the
      fact that if it walked across first mismatched pair it is no longer safe to
      compare rest.
      
      The reason why it continues matching is because it hopes it will
      eventually get pair of COMPONENT_REFs from types of same size and use
      TBAA to conclude that their addresses must be either same or completely
      disjoint.
      
      This patch makes the loop to terminate early but popping all the
      remaining pairs so walking can continue.  We could re-synchronize on
      arrays of same size with TBAA but this is bit fishy (because we try to
      support some sort of partial array overlaps) and hard to implement
      (because of zero sized arrays and VLAs) so I think it is not worth the
      effort.
      
      In addition I notied that the function is not !flag_strict_aliasing safe
      and added early exits on places we set seen_unmatched_ref_p since later
      we do not check that in:
      
             /* If we skipped array refs on type of different sizes, we can
       	 no longer be sure that there are not partial overlaps.  */
             if (seen_unmatched_ref_p
       	  && !operand_equal_p (TYPE_SIZE (type1), TYPE_SIZE (type2), 0))
       	{
       	  ++alias_stats
       	    .nonoverlapping_refs_since_match_p_may_alias;
      	}
      
        	PR tree-optimization/93586
      	* tree-ssa-alias.c (nonoverlapping_array_refs_p): Finish array walk
      	after mismatched array refs; do not sure type size information to
      	recover from unmatched referneces with !flag_strict_aliasing_p.
      
      	* gcc.dg/torture/pr93586.c: New testcase.
      Jan Hubicka committed
    • amdgcn: Use correct offset mode for gather/scatter · b5fb73b6
      The scatter/gather pattern names changed for GCC 10, but I hadn't noticed.
      This switches the patterns to the new offset mode scheme.
      
      2020-02-21  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* config/gcn/gcn-valu.md (gather_load<mode>): Rename to ...
      	(gather_load<mode>v64si): ... this and set operand 2 to V64SI.
      	(scatter_store<mode>): Rename to ...
      	(scatter_store<mode>v64si): ... this and set operand 1 to V64SI.
      	(scatter<mode>_exec): Delete. Move contents ...
      	(mask_scatter_store<mode>): ... here, and rename that to ...
      	(mask_gather_load<mode>v64si): ... this. Set operand 2 to V64SI.
      	Remove mode conversion.
      	(mask_gather_load<mode>): Rename to ...
      	(mask_scatter_store<mode>v64si): ... this. Set operand 1 to V64SI.
      	Remove mode conversion.
      	* config/gcn/gcn.c (gcn_expand_scaled_offsets): Remove mode conversion.
      Andrew Stubbs committed
    • sra: Only verify sizes of scalar accesses (PR 93845) · 4d6bf96b
      the testcase is another example - in addition to recent PR 93516 - where
      the SRA access verifier is confused by the fact that get_ref_base_extent
      can return different sizes for the same type, depending whether they are
      COMPONENT_REF or not.  In the previous bug I decided to keep the
      verifier check for aggregate type even though it is not really important
      and instead avoid easily detectable type-within-the-same-type situation.
      This testcase is however a result of a fairly random looking type cast
      and so cannot be handled in the same way.
      
      Because the check is not really important for aggregates, this patch
      simply disables it for non-register types.
      
      2020-02-21  Martin Jambor  <mjambor@suse.cz>
      
      	PR tree-optimization/93845
      	* tree-sra.c (verify_sra_access_forest): Only test access size of
      	scalar types.
      
      	testsuite/
      	* g++.dg/tree-ssa/pr93845.C: New test.
      Martin Jambor committed
    • amdgcn: Align VGPR pairs · 3abfd4f3
      Aligning the registers is not needed by the architecture, but doing so
      allows us to remove the requirement for bug-prone early-clobber
      constraints from many split patterns (and avoid adding more in future).
      
      2020-02-21  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* config/gcn/gcn.c (gcn_hard_regno_mode_ok): Align VGPR pairs.
      	* config/gcn/gcn-valu.md (addv64di3): Remove early-clobber.
      	(addv64di3_exec): Likewise.
      	(subv64di3): Likewise.
      	(subv64di3_exec): Likewise.
      	(addv64di3_zext): Likewise.
      	(addv64di3_zext_exec): Likewise.
      	(addv64di3_zext_dup): Likewise.
      	(addv64di3_zext_dup_exec): Likewise.
      	(addv64di3_zext_dup2): Likewise.
      	(addv64di3_zext_dup2_exec): Likewise.
      	(addv64di3_sext_dup2): Likewise.
      	(addv64di3_sext_dup2_exec): Likewise.
      	(<expander>v64di3): Likewise.
      	(<expander>v64di3_exec): Likewise.
      	(*<reduc_op>_dpp_shr_v64di): Likewise.
      	(*plus_carry_dpp_shr_v64di): Likewise.
      	* config/gcn/gcn.md (adddi3): Likewise.
      	(addptrdi3): Likewise.
      	(<expander>di3): Likewise.
      Andrew Stubbs committed
    • amdgcn: fix mode in vec_series · 2291d1fd
      2020-02-21  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* config/gcn/gcn-valu.md (vec_seriesv64di): Use gen_vec_duplicatev64di.
      Andrew Stubbs committed
    • aarch64: Add SVE support for -mlow-precision-sqrt · a0ee8352
      SVE was missing support for -mlow-precision-sqrt, which meant that
      -march=armv8.2-a+sve -mlow-precision-sqrt could cause a performance
      regression compared to -march=armv8.2-a -mlow-precision-sqrt.
      
      2020-02-21  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Add SVE
      	support.  Use aarch64_emit_mult instead of emitting multiplication
      	instructions directly.
      	* config/aarch64/aarch64-sve.md (sqrt<mode>2, rsqrt<mode>2)
      	(@aarch64_rsqrte<mode>, @aarch64_rsqrts<mode>): New expanders.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/rsqrt_1.c: New test.
      	* gcc.target/aarch64/sve/rsqrt_1_run.c: Likewise.
      	* gcc.target/aarch64/sve/sqrt_1.c: Likewise.
      	* gcc.target/aarch64/sve/sqrt_1_run.c: Likewise.
      Richard Sandiford committed
    • aarch64: Add SVE support for -mlow-precision-div · 04f307cb
      SVE was missing support for -mlow-precision-div, which meant that
      -march=armv8.2-a+sve -mlow-precision-div could cause a performance
      regression compared to -march=armv8.2-a -mlow-precision-div.
      
      I ended up doing this much later than originally intended, sorry...
      
      2020-02-21  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.c (aarch64_emit_mult): New function.
      	(aarch64_emit_approx_div): Add SVE support.  Use aarch64_emit_mult
      	instead of emitting multiplication instructions directly.
      	* config/aarch64/iterators.md (SVE_COND_FP_BINARY_OPTAB): New iterator.
      	* config/aarch64/aarch64-sve.md (div<mode>3, @aarch64_frecpe<mode>)
      	(@aarch64_frecps<mode>): New expanders.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/recip_1.c: New test.
      	* gcc.target/aarch64/sve/recip_1_run.c: Likewise.
      	* gcc.target/aarch64/sve/recip_2.c: Likewise.
      	* gcc.target/aarch64/sve/recip_2_run.c: Likewise.
      Richard Sandiford committed
    • aarch64: Bump AARCH64_APPROX_MODE to 64 bits · d87778ed
      We now have more than 32 scalar and vector float modes, so the
      32-bit AARCH64_APPROX_MODE would invoke UB for some of them.
      Bumping to a 64-bit mask fixes that... for now.
      
      Ideally we'd have a static assert to trap this, but logically
      it would go at file scope.  I think it would be better to wait
      until the switch to C++11, so that we can use static_assert
      directly.
      
      2020-02-21  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/aarch64-protos.h (AARCH64_APPROX_MODE): Operate
      	on and produce uint64_ts rather than ints.
      	(AARCH64_APPROX_NONE, AARCH64_APPROX_ALL): Change to uint64_ts.
      	(cpu_approx_modes): Change the fields from unsigned int to uint64_t.
      Richard Sandiford committed
    • aarch64: Avoid creating an unused register · 0df28e68
      The rsqrt path of aarch64_emit_approx_sqrt created a pseudo
      register that it never used.
      
      2020-02-21  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Don't create
      	an unused xmsk register when handling approximate rsqrt.
      Richard Sandiford committed