1. 04 May, 2018 26 commits
    • [BRIGFE] Fix handling of NOPs. · 73def6ea
      From-SVN: r259958
      Pekka Jääskeläinen committed
    • [BRIGFE] phsa-specific optimizations · 080dc243
      Add flag -fassume-phsa that is on by default. If -fno-assume-phsa
      is given, these optimizations are disabled.
      
      With this flag, gccbrig can generate GENERIC that assumes we are
      targeting a phsa-runtime based implementation, which allows us
      to expose the work-item context accesses to retrieve WI IDs etc.
      which helps optimizers.
      
      First optimization that takes advantage of this is to get rid of
      the setworkitemid calls whenever we have non-inlined calls that
      use IDs internally.
      
      Other optimizations added in this commit:
      
      - expand absoluteid to similar level of simplicity as workitemid.
      At the moment absoluteid is the best indexing ID to end up with
      WG vectorization.
      - propagate ID variables closer to their uses. This is mainly
      to avoid known useless casts, which confuse at least scalar
      evolution analysis.
      - use signed long long for storing IDs. Unsigned integers have
      defined wraparound semantics, which confuse at least scalar
      evolution analysis, leading to unvectorizable WI loops.
      - also refactor some BRIG function generation helpers to brig_function.
      - no point in having the wi-loop as a for-loop. It's really
      a do...while and SCEV can analyze it just fine still.
      - add consts to ptrs etc. in BRIG builtin defs.
      Improves optimization opportunities.
      - add qualifiers to generated function parameters.
      Const and restrict on the hidden local/private pointers,
      the arg buffer and the context pointer help some optimizations.
      
      From-SVN: r259957
      Pekka Jääskeläinen committed
    • [BRIGFE] do not allow optimizations based on known C builtins · 60a3d46c
      It can break inputs that have similarly named functions.
      
      From-SVN: r259949
      Pekka Jääskeläinen committed
    • [BRIGFE] The modulo in ID computation should not be needed. · f986735a
      The case where a dim is greater than the grid size doesn't seem
      to be mentioned in the specs nor tested by PRM test suite.
      
      From-SVN: r259944
      Pekka Jääskeläinen committed
    • [BRIGFE] Enable whole program optimizations · 637f3cde
      HSA assumes all program scope HSAIL symbols can be queried from
      the host runtime API, thus cannot be removed by the IPA.
      
      Getting some inlining happening in the finalized binary required:
      * explicitly marking the 'prog' scope functions and the launcher
      function "externally_visible" to avoid the inliner removing it
      * also the host_def ptr is set to externally visible, otherwise
      IPA assumes it's never set
      * adding the 'inline' keyword to functions to enable inlining,
      otherwise GCC defaults to replaceable functions (one can link
      over the previous one) which cannot be inlined
      * replacing all calls to declarations with calls to definitions to
      enable the inliner to find the definition
      * to fix missing hidden argument types in the generated functions.
      These were ignored silently until GCC started to be able to
      inline calls to such functions.
      * do not gimplify before fixing the call targets. Otherwise the
      calls get detached and the definitions are not found. The reason
      why this happens is not clear, but gimplifying only after call
      target decl->def conversion fixes this.
      
      From-SVN: r259943
      Pekka Jääskeläinen committed
    • [BRIGFE] fix an alloca stack underflow · 1b40975c
      We didn't preserve additional space for the alloca frame pointers that
      are needed to be saved in the alloca space.
      
      Fixes libgomp.c++/target-6.C execution test.
      
      From-SVN: r259942
      Pekka Jääskeläinen committed
    • * uk.po: Update. · 534fe823
      From-SVN: r259938
      Joseph Myers committed
    • re PR go/85630 (GCC 8.1.0: Filesystem pollution during build: .cache dir in $HOME) · cceec155
      	PR go/85630
      	* Makefile.am (CHECK_ENV): Set GOCACHE.
      	(ECHO_ENV): Update for setting of GOCACHE.
      	* Makefile.in: Rebuild.
      
      From-SVN: r259937
      Ian Lance Taylor committed
    • vsx-vector-6.h (foo): Add test for vec_max, vec_trunc. · 53481a28
      gcc/testsuite/ChangeLog:
      
      2018-05-04 Carl Love  <cel@us.ibm.com>
      	* gcc.target/powerpc/vsx-vector-6.h (foo): Add test for vec_max,
      	vec_trunc.
      	* gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update xvcmpeqdp,
      	xvcmpgtdp, xvcmpgedp counts. Add xxsel counts.
      	* gcc.target/powerpc/vsx-vector-6-be.c (dg-final): Update xvcmpgtdp,
      	xvcmpgedp counts. Add xxsel counts.
      
      From-SVN: r259936
      Carl Love committed
    • libgo: fix for unaligned read in go-unwind.c's read_encoded_value() · 772455c9
          
          Change code to work properly reading unaligned data on architectures
          that don't support unaliged reads. This fixes a regression (broke
          Solaris/sparc) introduced in https://golang.org/cl/90235.
          
          Reviewed-on: https://go-review.googlesource.com/111296
      
      From-SVN: r259935
      Ian Lance Taylor committed
    • libffi PowerPC64 ELFv1 fp arg fixes · 71d372eb
      The ELFv1 ABI says: "Single precision floating point values are mapped
      to the second word in a single doubleword" and also "Floating point
      registers f1 through f13 are used consecutively to pass up to 13
      floating point values, one member aggregates passed by value
      containing a floating point value, and to pass complex floating point
      values".
      
      libffi wasn't expecting float args in the second word, and wasn't
      passing one member aggregates in fp registers.  This patch fixes those
      problems, making use of the existing ELFv2 homogeneous aggregate
      support since a one element fp struct is a special case of an
      homogeneous aggregate.
      
      I've also set a flag when returning pointers that might be used one
      day.  This is just a tidy since the ppc64 assembly support code
      currently doesn't test FLAG_RETURNS_64BITS for integer types..
      
      	* src/powerpc/ffi_linux64.c (discover_homogeneous_aggregate):
      	Compile for ELFv1 too, handling single element aggregates.
      	(ffi_prep_cif_linux64_core): Call discover_homogeneous_aggregate
      	for ELFv1.  Set FLAG_RETURNS_64BITS for FFI_TYPE_POINTER return.
      	(ffi_prep_args64): Call discover_homogeneous_aggregate for ELFv1,
      	and handle single element structs containing float or double
      	as if the element wasn't wrapped in a struct.  Store floats in
      	second word of doubleword slot when big-endian.
      	(ffi_closure_helper_LINUX64): Similarly.
      
      From-SVN: r259934
      Alan Modra committed
    • bb-reorder.c (sanitize_hot_paths): Release hot_bbs_to_check. · dd172744
      2018-05-04  Richard Biener  <rguenther@suse.de>
      
      	* bb-reorder.c (sanitize_hot_paths): Release hot_bbs_to_check.
      	* gimple-ssa-store-merging.c
      	(imm_store_chain_info::output_merged_store): Remove redundant create,
      	release split_store vector contents on failure.
      	* tree-vect-slp.c (vect_schedule_slp_instance): Avoid leaking
      	scalar stmt vector on cache hit.
      
      From-SVN: r259932
      Richard Biener committed
    • rs6000: Remove Xilinx FP · 2c2aa74d
      This removes the special Xilinx FP support.  It was deprecated in
      GCC 8.
      
      After this patch all of TARGET_{DOUBLE,SINGLE}_FLOAT,
      TARGET_{DF,SF}_INSN, and TARGET_{DF,SF}_FPR are replaced by
      TARGET_HARD_FLOAT.  Also the fp_type attribute is deleted.
      
      
      	* common/config/rs6000/rs6000-common.c (rs6000_handle_option): Remove
      	Xilinx FP support.
      	* config.gcc (powerpc-xilinx-eabi*): Remove.
      	* config/rs6000/predicates.md (easy_fp_constant): Remove Xilinx FP
      	support.
      	(fusion_addis_mem_combo_load): Ditto.
      	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Remove Xilinx
      	FP support.
      	(rs6000_cpu_cpp_builtins): Ditto.
      	* config/rs6000/rs6000-linux.c
      	(rs6000_linux_float_exceptions_rounding_supported_p): Ditto.
      	* config/rs6000/rs6000-opts.h (enum fpu_type_t): Delete.
      	* config/rs6000/rs6000.c (rs6000_debug_reg_global): Remove Xilinx FP
      	support.
      	(rs6000_setup_reg_addr_masks): Ditto.
      	(rs6000_init_hard_regno_mode_ok): Ditto.
      	(rs6000_option_override_internal): Ditto.
      	(legitimate_lo_sum_address_p): Ditto.
      	(rs6000_legitimize_address): Ditto.
      	(rs6000_legitimize_reload_address): Ditto.
      	(rs6000_legitimate_address_p): Ditto.
      	(abi_v4_pass_in_fpr): Ditto.
      	(setup_incoming_varargs): Ditto.
      	(rs6000_gimplify_va_arg): Ditto.
      	(rs6000_split_multireg_move): Ditto.
      	(rs6000_savres_strategy): Ditto.
      	(rs6000_emit_prologue_components): Ditto.
      	(rs6000_emit_epilogue_components): Ditto.
      	(rs6000_emit_prologue): Ditto.
      	(rs6000_emit_epilogue): Ditto.
      	(rs6000_elf_file_end): Ditto.
      	(rs6000_function_value): Ditto.
      	(rs6000_libcall_value): Ditto.
      	* config/rs6000/rs6000.h: Ditto.
      	(TARGET_MINMAX_SF, TARGET_MINMAX_DF): Delete, merge to ...
      	(TARGET_MINMAX): ... this.  New.
      	(TARGET_SF_FPR, TARGET_DF_FPR, TARGET_SF_INSN, TARGET_DF_INSN): Delete.
      	* config/rs6000/rs6000.md: Remove Xilinx FP support.
      	(*movsi_internal1_single): Delete.
      	* config/rs6000/rs6000.opt (msingle-float, mdouble-float, msimple-fpu,
      	mfpu=, mxilinx-fpu): Delete.
      	* config/rs6000/singlefp.h: Delete.
      	* config/rs6000/sysv4.h: Remove Xilinx FP support.
      	* config/rs6000/t-rs6000: Ditto.
      	* config/rs6000/t-xilinx: Delete.
      	* gcc/config/rs6000/titan.md: Adjust for fp_type removal.
      	* gcc/config/rs6000/vsx.md: Remove Xilinx FP support.
      	(VStype_simple): Delete.
      	(VSfptype_simple, VSfptype_mul, VSfptype_div, VSfptype_sqrt): Delete.
      	* config/rs6000/xfpu.h: Delete.
      	* config/rs6000/xfpu.md: Delete.
      	* config/rs6000/xilinx.h: Delete.
      	* config/rs6000/xilinx.opt: Delete.
      	* gcc/doc/invoke.texi (RS/6000 and PowerPC Options): Remove
      	-msingle-float, -mdouble-float, -msimple-fpu, -mfpu=, and -mxilinx-fpu.
      
      From-SVN: r259929
      Segher Boessenkool committed
    • PR libstdc++/85642 fix is_nothrow_default_constructible<optional<T>> · d6ed6b07
      Add missing noexcept keyword to default constructor of each
      _Optional_payload specialization.
      
      	PR libstdc++/85642 fix is_nothrow_default_constructible<optional<T>>
      	* include/std/optional (_Optional_payload): Add noexcept to default
      	constructor. Re-indent.
      	(_Optional_payload<_Tp, true, true, true>): Likewise. Add noexcept to
      	constructor for copying disengaged payloads.
      	(_Optional_payload<_Tp, true, false, true>): Likewise.
      	(_Optional_payload<_Tp, true, true, false>): Likewise.
      	(_Optional_payload<_Tp, true, false, false>): Likewise.
      	* testsuite/20_util/optional/cons/85642.cc: New.
      	* testsuite/20_util/optional/cons/value_neg.cc: Adjust dg-error lines.
      
      From-SVN: r259928
      Jonathan Wakely committed
    • [expand] Handle null target in expand_builtin_goacc_parlevel_id_size · 39bc9f83
      2018-05-04  Tom de Vries  <tom@codesourcery.com>
      
      	PR libgomp/85639
      	* builtins.c (expand_builtin_goacc_parlevel_id_size): Handle null target
      	if ignore == 0.
      
      From-SVN: r259927
      Tom de Vries committed
    • re PR ada/85635 (typo in link.c for BSD platforms) · 5759c56d
      	PR ada/85635
      	* link.c (BSD platforms): Add missing backslash.
      
      From-SVN: r259925
      John Marino committed
    • re PR tree-optimization/85627 (ICE in update_phi_components in tree-complex.c) · 7d187fdf
      2018-05-04  Richard Biener  <rguenther@suse.de>
      
      	PR middle-end/85627
      	* tree-complex.c (update_complex_assignment): We are always in SSA form.
      	(expand_complex_div_wide): Likewise.
      	(expand_complex_operations_1): Likewise.
      	(expand_complex_libcall): Preserve EH info of the original stmt.
      	(tree_lower_complex): Handle removed blocks.
      	* tree.c (build_common_builtin_nodes): Do not set ECF_NOTRHOW
      	on complex multiplication and division libcall builtins.
      
      	* g++.dg/torture/pr85627.C: New testcase.
      
      From-SVN: r259923
      Richard Biener committed
    • re PR lto/85574 (LTO bootstapped binaries differ) · 9b5713f7
      2018-05-04  Richard Biener  <rguenther@suse.de>
      
      	PR middle-end/85574
      	* fold-const.c (negate_expr_p): Restrict negation of operand
      	zero of a division to when we know that can happen without
      	overflow.
      	(fold_negate_expr_1): Likewise.
      
      	* gcc.dg/torture/pr85574.c: New testcase.
      	* gcc.dg/torture/pr57656.c: Use dg-additional-options.
      
      From-SVN: r259922
      Richard Biener committed
    • re PR tree-optimization/85466 (Performance is slow when doing 'branchless'… · 04782385
      re PR tree-optimization/85466 (Performance is slow when doing 'branchless' conditional style math operations)
      
      	PR libstdc++/85466
      	* real.h (real_nextafter): Declare.
      	* real.c (real_nextafter): New function.
      	* fold-const-call.c (fold_const_nextafter): New function.
      	(fold_const_call_sss): Call it for CASE_CFN_NEXTAFTER and
      	CASE_CFN_NEXTTOWARD.
      	(fold_const_call_1): For CASE_CFN_NEXTTOWARD call fold_const_call_sss
      	even when arg1_mode is different from arg0_mode.
      
      	* gcc.dg/nextafter-1.c: New test.
      	* gcc.dg/nextafter-2.c: New test.
      	* gcc.dg/nextafter-3.c: New test.
      	* gcc.dg/nextafter-4.c: New test.
      
      From-SVN: r259921
      Jakub Jelinek committed
    • cmd/go: update mkalldocs.sh · 105073e1
          
          Update mkalldocs.sh from the current master sources, replacing the old
          mkdoc.sh.
          
          Reviewed-on: https://go-review.googlesource.com/111096
      
      From-SVN: r259920
      Ian Lance Taylor committed
    • cmd/go: enable tests of vet tool · 28fc5502
          
          Since gofrontend does have the vet tool now, we can test it.
          
          Reviewed-on: https://go-review.googlesource.com/111095
      
      From-SVN: r259919
      Ian Lance Taylor committed
    • cmd/go: update to match recent changes to gc · 65229328
          
          In https://golang.org/cl/111097 the gc version of cmd/go was updated
          to include some gofrontend-specific changes. The gofrontend code
          already has different versions of those changes; this CL makes the
          gofrontend match the upstream code.
          
          Reviewed-on: https://go-review.googlesource.com/111099
      
      From-SVN: r259918
      Ian Lance Taylor committed
    • Daily bump. · e7902c2c
      From-SVN: r259917
      GCC Administrator committed
  2. 03 May, 2018 14 commits
    • PR c++/85600 - virtual delete failure. · 9cbc7d65
      	* init.c (build_delete): Always save_expr when deleting.
      
      From-SVN: r259913
      Jason Merrill committed
    • PR libstdc++/82644 define TR1 hypergeometric functions in strict modes · 86f66562
      Following a recent change for PR 82644 the non-standard hypergeomtric
      functions are not defined by <cmath> when __STRICT_ANSI__ is defined
      (e.g. for -std=c++17, or -std=c++14 -D__STDCPP_WANT_MATH_SPEC_FUNCS__).
      That caused errors in <tr1/cmath> because the using-declarations for
      tr1::hyperg et al are invalid in strict modes.
      
      The solution is to define the TR1 hypergeometric functions inline in
      <tr1/cmath> if __STRICT_ANSI__ is defined.
      
      	PR libstdc++/82644
      	* include/tr1/cmath [__STRICT_ANSI__] (hypergf, hypergl, hyperg): Use
      	inline definitions instead of using-declarations.
      	[__STRICT_ANSI__] (conf_hypergf, conf_hypergl, conf_hyperg): Likewise.
      	* testsuite/tr1/5_numerical_facilities/special_functions/
      	07_conf_hyperg/compile_cxx17.cc: New.
      	* testsuite/tr1/5_numerical_facilities/special_functions/
      	17_hyperg/compile_cxx17.cc: New.
      
      From-SVN: r259912
      Jonathan Wakely committed
    • [C++ Patch] Kill -ffriend-injection · 6c072e21
      https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00175.html
      
      	* doc/extend.texi (Deprecated Features): Remove
      	-ffriend-injection.
      	(Backwards Compatibility): Likewise.
      	* doc/invoke.texi (C++ Language Options): Likewise.
      	(C++ Dialect Options): Likewise.
      
      	c-family/
      	* c.opt (ffriend-injection): Remove functionality, issue warning.
      
      	cp/
      	* decl.c (cxx_init_decl_processing): Remove flag_friend_injection.
      	* name-lookup.c (do_pushdecl): Likewise.
      
      	testsuite/
      	Remove -ffriend-injection.
      	* g++.old-deja/g++.jason/scoping15.C: Delete.
      	* g++.old-deja/g++.mike/net43.C: Delete.
      
      From-SVN: r259904
      Nathan Sidwell committed
    • re PR target/85530 ([X86] _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 not implemented) · 503ac4e0
      	PR target/85530
      	* config/i386/avx512fintrin.h (_mm512_mullox_epi64,
      	_mm512_mask_mullox_epi64): New intrinsics.
      
      	* gcc.target/i386/avx512f-vpmullq-1.c: New test.
      	* gcc.target/i386/avx512f-vpmullq-2.c: New test.
      	* gcc.target/i386/avx512dq-vpmullq-3.c: New test.
      	* gcc.target/i386/avx512dq-vpmullq-4.c: New test.
      
      From-SVN: r259903
      Jakub Jelinek committed
    • PR libstdc++/84769 qualify call to std::get<0> · 1ee021f2
      	PR libstdc++/84769
      	* include/std/variant (visit): Qualify std::get call.
      
      From-SVN: r259902
      Jonathan Wakely committed
    • PR libstdc++/85632 fix wraparound in filesystem::space · 2e023647
      On 32-bit targets any values over 4GB would wrap and produce the wrong
      result.
      
      	PR libstdc++/85632 use uintmax_t for arithmetic
      	* src/filesystem/ops.cc (experimental::filesystem::space): Perform
      	arithmetic in result type.
      	* src/filesystem/std-ops.cc (filesystem::space): Likewise.
      	* testsuite/27_io/filesystem/operations/space.cc: Check total capacity
      	is greater than free space.
      	* testsuite/experimental/filesystem/operations/space.cc: New.
      
      From-SVN: r259901
      Jonathan Wakely committed
    • compiler: avoid crashing on invalid non-integer array length · d18734b5
          
          Tweak the array type checking code to avoid crashing on array types
          whose length expressions are explicit non-integer types (for example,
          "float64(10)"). If such constructs are seen, issue an "invalid array
          bound" error.
          
          Fixes golang/go#13486.
          
          Reviewed-on: https://go-review.googlesource.com/91975
      
      From-SVN: r259900
      Ian Lance Taylor committed
    • Update .po files. · 4e0c5f94
      	* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
      	ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po,
      	zh_TW.po: Update.
      
      From-SVN: r259897
      Joseph Myers committed
    • Add tests for std::remove_cvref · adba76a3
      	* testsuite/20_util/remove_cvref/requirements/alias_decl.cc: New.
      	* testsuite/20_util/remove_cvref/requirements/explicit_instantiation.cc:
      	New.
      	* testsuite/20_util/remove_cvref/value.cc: New.
      	* testsuite/20_util/remove_cvref/value_ext.cc: New.
      
      From-SVN: r259896
      Jonathan Wakely committed
    • PR libstdc++/84087 add default arguments to basic_string members (LWG 2268) · 852ee53c
      This change was a DR against C++11 and so should have been implemented
      years ago.
      
      	PR libstdc++/84087 LWG DR 2268 basic_string default arguments
      	* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI=1]
      	(append(const basic_string&, size_type, size_type)
      	(assign(const basic_string&, size_type, size_type)
      	(insert(size_type, const basic_string&, size_type, size_type)
      	(replace(size_type,size_type,const basic_string&,size_type,size_type)
      	(compare(size_type,size_type,constbasic_string&,size_type,size_type)):
      	Add default arguments (LWG 2268).
      	[_GLIBCXX_USE_CXX11_ABI=0]
      	(append(const basic_string&, size_type, size_type)
      	(assign(const basic_string&, size_type, size_type)
      	(insert(size_type, const basic_string&, size_type, size_type)
      	(replace(size_type,size_type,const basic_string&,size_type,size_type)
      	(compare(size_type,size_type,constbasic_string&,size_type,size_type)):
      	Likewise.
      	* testsuite/21_strings/basic_string/dr2268.cc: New test.
      
      From-SVN: r259895
      Jonathan Wakely committed
    • PR libstdc++/84535 constrain std::thread constructor · d49b3426
      The standard requires that the std::thread constructor is constrained so
      it can't be called with a first argument of type std::thread. The
      current implementation only meets that requirement if the constructor is
      called with one argument, by using deleted overloads. This uses an
      enable_if constraint to enforce the requirement for any number of
      arguments.
      
      Also add a static assertion to give a more readable error for invalid
      arguments that cannot be invoked. Also simplify _Invoker to reduce the
      error cascade for ill-formed instantiations with non-invocable
      arguments.
      
      	PR libstdc++/84535
      	* include/std/thread (thread::__not_same): New SFINAE helper.
      	(thread::thread(_Callable&&, _Args&&...)): Add SFINAE constraint that
      	first argument is not a std::thread. Add static assertion to check
      	INVOKE expression is valid.
      	(thread::thread(thread&), thread::thread(const thread&&)): Remove.
      	(thread::_Invoke::_M_invoke, thread::_Invoke::operator()): Use
      	__invoke_result for return types and remove exception specifications.
      	* testsuite/30_threads/thread/cons/84535.cc: New.
      
      From-SVN: r259893
      Jonathan Wakely committed
    • [testsuite] Add scan-offload-tree-dump · 63f12215
      2018-05-03  Tom de Vries  <tom@codesourcery.com>
      
      	PR testsuite/85106
      	* lib/scanoffloadtree.exp: New file.
      
      	* testsuite/lib/libgomp-dg.exp (libgomp-dg-test): Add save-temps to
      	extra_tool_flags if it contains an -foffload=-fdump-* flag.
      	* testsuite/lib/libgomp.exp: Include scanoffloadtree.exp.
      	* testsuite/libgomp.oacc-c/vec.c: Use scan-offload-tree-dump.
      
      	* doc/sourcebuild.texi (Commands for use in dg-final, Scan optimization
      	dump files): Add offload-tree.
      
      From-SVN: r259892
      Tom de Vries committed
    • re PR tree-optimization/85615 (ICE at -O2 and above on valid code on… · a378f85c
      re PR tree-optimization/85615 (ICE at -O2 and above on valid code on x86_64-linux-gnu: in dfs_enumerate_from, at cfganal.c:1197)
      
      2018-05-03  Richard Biener  <rguenther@suse.de>
      
      	PR tree-optimization/85615
      	* tree-ssa-threadupdate.c (thread_block_1): Only allow exits
      	to loops not nested in BBs loop father to avoid creating multi-entry
      	loops.
      
      	* gcc.dg/torture/pr85615.c: New testcase.
      
      From-SVN: r259891
      Richard Biener committed
    • [tree-complex.c] PR tree-optimization/70291: Inline floating-point complex… · b7244ccb
      [tree-complex.c] PR tree-optimization/70291: Inline floating-point complex multiplication more aggressively
      
      We can improve the performance of complex floating-point multiplications by inlining the expansion a bit more aggressively.
      We can inline complex x = a * b as:
      x = (ar*br - ai*bi) + i(ar*bi + br*ai);
      if (isunordered (__real__ x, __imag__ x))
        x = __muldc3 (a, b); //Or __mulsc3 for single-precision
      
      That way the common case where no NaNs are produced we can avoid the libgcc call and fall back to the
      NaN handling stuff in libgcc if either components of the expansion are NaN.
      
      The implementation is done in expand_complex_multiplication in tree-complex.c and the above expansion
      will be done when optimising for -O1 and greater and when not optimising for size.
      At -O0 and -Os the single call to libgcc will be emitted.
      
      For the code:
      __complex double
      foo (__complex double a, __complex double b)
      {
        return a * b;
      }
      
      We will now emit at -O2 for aarch64:
      foo:
              fmul    d16, d1, d3
              fmul    d6, d1, d2
              fnmsub  d5, d0, d2, d16
              fmadd   d4, d0, d3, d6
              fcmp    d5, d4
              bvs     .L8
              fmov    d1, d4
              fmov    d0, d5
              ret
      .L8:
              stp     x29, x30, [sp, -16]!
              mov     x29, sp
              bl      __muldc3
              ldp     x29, x30, [sp], 16
              ret
      
      Instead of just a branch to __muldc3.
      
      	PR tree-optimization/70291
      	* tree-complex.c (expand_complex_libcall): Add type, inplace_p
      	arguments.  Change return type to tree.  Emit libcall as a new
      	statement rather than replacing existing one when inplace_p is true.
      	(expand_complex_multiplication_components): New function.
      	(expand_complex_multiplication): Expand floating-point complex
      	multiplication using the above.
      	(expand_complex_division): Rename inner_type parameter to type.
      	Update expand_complex_libcall call-site.
      	(expand_complex_operations_1): Update expand_complex_multiplication
      	and expand_complex_division call-sites.
      
      	* gcc.dg/complex-6.c: New test.
      	* gcc.dg/complex-7.c: Likewise.
      
      From-SVN: r259889
      Kyrylo Tkachov committed