1. 23 Oct, 2018 9 commits
  2. 22 Oct, 2018 25 commits
    • symtab.c (symtab_node::increase_alignment): Correct max alignment check. · 97580980
      	* symtab.c (symtab_node::increase_alignment): Correct max
      	alignment check.
      
      From-SVN: r265404
      Paul Koning committed
    • re PR tree-optimization/87633 (ice in compare_range_wit h_value, at vr-values.c:1702) · f3842847
      2018-10-22  Yury Gribov  <tetra2005@gmail.com>
      
      gcc/
      	PR tree-optimization/87633
      	* match.pd: Do not generate unordered integer comparisons.
      
      gcc/testsuite/
      	PR tree-optimization/87633
      	* g++.dg/pr87633.C: New test.
      
      From-SVN: r265399
      Yury Gribov committed
    • combine: Do not combine moves from hard registers · 8d2d3958
      On most targets every function starts with moves from the parameter
      passing (hard) registers into pseudos.  Similarly, after every call
      there is a move from the return register into a pseudo.  These moves
      usually combine with later instructions (leaving pretty much the same
      instruction, just with a hard reg instead of a pseudo).
      
      This isn't a good idea.  Register allocation can get rid of unnecessary
      moves just fine, and moving the parameter passing registers into many
      later instructions tends to prevent good register allocation.  This
      patch disallows combining moves from a hard (non-fixed) register.
      
      This also avoid the problem mentioned in PR87600 #c3 (combining hard
      registers into inline assembler is problematic).
      
      Because the register move can often be combined with other instructions
      *itself*, for example for setting some condition code, this patch adds
      extra copies via new pseudos after every copy-from-hard-reg.
      
      On some targets this reduces average code size.  On others it increases
      it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).
      
      
      	PR rtl-optimization/87600
      	* combine.c: Add include of expr.h.
      	(cant_combine_insn_p): Do not combine moves from any hard non-fixed
      	register to a pseudo.
      	(make_more_copies): New function, add a copy to a new pseudo after
      	the moves from hard registers into pseudos.
      	(rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
      	later.  Call make_more_copies.
      
      From-SVN: r265398
      Segher Boessenkool committed
    • re PR testsuite/87694 (problem in g++.dg/concepts/memfun-err.C starting with r263343) · f3b13f46
      	PR testsuite/87694
      	* g++.dg/concepts/memfun-err.C: Make it a compile test.
      
      From-SVN: r265397
      Marek Polacek committed
    • Don't double-count early-clobber matches. · dbe7895c
      Given a pattern with a number of operands:
      
      (match_operand 0 "" "=&v")
      (match_operand 1 "" " v0")
      (match_operand 2 "" " v0")
      (match_operand 3 "" " v0")
      
      GCC will currently increment "reject" once, for operand 0, and then decrement
      it once for each of the other operands, ending with reject == -2 and an
      assertion failure.  If there's a conflict then it might try to decrement reject
      yet again.
      
      Incidentally, what these patterns are trying to achieve is an allocation in
      which operand 0 may match one of the other operands, but may not partially
      overlap any of them.  Ideally there'd be a better way to do this.
      
      In any case, it will affect any pattern in which multiple operands may (or
      must) match an early-clobber operand.
      
      The patch only allows a reject-- when one has not already occurred, for that
      operand.
      
      2018-10-22  Andrew Stubbs  <ams@codesourcery.com>
      
      	gcc/
      	* lra-constraints.c (process_alt_operands): New local array,
      	matching_early_clobber.  Check matching_early_clobber before
      	decrementing reject, and set matching_early_clobber after.
      
      From-SVN: r265393
      Andrew Stubbs committed
    • rs6000: Handle print_operand_address for unexpected RTL (PR87598) · b333d8b6
      As the PR shows, the user can force this to be called on at least some
      RTL that is not a valid address.  Most targets treat this as if the
      user knows best; let's do the same.
      
      
      	PR target/87598
      	* config/rs6000/rs6000.c (print_operand_address): For unexpected RTL
      	call output_addr_const and hope for the best.
      
      From-SVN: r265392
      Segher Boessenkool committed
    • 2018-10-22 Richard Biener <rguenther@suse.de> · e86087ee
      	* gimple-ssa-evrp-analyze.c
      	(evrp_range_analyzer::record_ranges_from_incoming_edge): Be
      	smarter about what ranges to use.
      	* tree-vrp.c (add_assert_info): Dump here.
      	(register_edge_assert_for_2): Instead of here at multiple but
      	not all places.
      
      	* gcc.dg/tree-ssa/evrp12.c: New testcase.
      	* gcc.dg/predict-6.c: Adjust.
      	* gcc.dg/tree-ssa/vrp33.c: Disable EVRP.
      	* gcc.dg/tree-ssa/vrp02.c: Likewise.
      	* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
      
      From-SVN: r265391
      Richard Biener committed
    • re PR middle-end/63155 (memory hog) · d1e14d97
      2018-10-22  Steven Bosscher <steven@gcc.gnu.org>
      	Richard Biener  <rguenther@suse.de>
      
      	* bitmap.h: Update data structure documentation, including a
      	description of bitmap views as either linked-lists or splay trees.
      	(struct bitmap_element_def): Update comments for splay tree bitmaps.
      	(struct bitmap_head_def): Likewise.
      	(bitmap_list_view, bitmap_tree_view): New prototypes.
      	(bitmap_initialize_stat): Initialize a bitmap_head's indx and
      	tree_form fields.
      	(bmp_iter_set_init): Assert the iterated bitmaps are in list form.
      	(bmp_iter_and_init, bmp_iter_and_compl_init): Likewise.
      	* bitmap.c (bitmap_elem_to_freelist): Unregister overhead of a
      	released bitmap element here.
      	(bitmap_element_free): Remove.
      	(bitmap_elt_clear_from): Work on splay tree bitmaps.
      	(bitmap_list_link_element): Renamed from bitmap_element_link.  Move
      	this function similar ones such that linked-list bitmap implementation
      	functions are grouped.
      	(bitmap_list_unlink_element): Renamed from bitmap_element_unlink,
      	and moved for grouping.
      	(bitmap_list_insert_element_after): Renamed from
      	bitmap_elt_insert_after, and moved for grouping.
      	(bitmap_list_find_element): New function spliced from bitmap_find_bit.
      	(bitmap_tree_link_left, bitmap_tree_link_right,
      	bitmap_tree_rotate_left, bitmap_tree_rotate_right, bitmap_tree_splay,
      	bitmap_tree_link_element, bitmap_tree_unlink_element,
      	bitmap_tree_find_element): New functions for splay-tree bitmap
      	implementation.
      	(bitmap_element_link, bitmap_element_unlink, bitmap_elt_insert_after):
      	Renamed and moved, see above entries.
      	(bitmap_tree_listify_from): New function to convert part of a splay
      	tree bitmap to a linked-list bitmap.
      	(bitmap_list_view): Convert a splay tree bitmap to linked-list form.
      	(bitmap_tree_view): Convert a linked-list bitmap to splay tree form.
      	(bitmap_find_bit): Remove.
      	(bitmap_clear, bitmap_clear_bit, bitmap_set_bit,
      	bitmap_single_bit_set_p, bitmap_first_set_bit, bitmap_last_set_bit):
      	Handle splay tree bitmaps.
      	(bitmap_copy, bitmap_count_bits, bitmap_and, bitmap_and_into,
      	bitmap_elt_copy, bitmap_and_compl, bitmap_and_compl_into,
      	bitmap_compl_and_into, bitmap_elt_ior, bitmap_ior, bitmap_ior_into,
      	bitmap_xor, bitmap_xor_into, bitmap_equal_p, bitmap_intersect_p,
      	bitmap_intersect_compl_p, bitmap_ior_and_compl,
      	bitmap_ior_and_compl_into, bitmap_set_range, bitmap_clear_range,
      	bitmap_hash): Reject trying to act on splay tree bitmaps.  Make
      	corresponding changes to use linked-list specific bitmap_element
      	manipulation functions as applicable for efficiency.
      	(bitmap_tree_to_vec): New function.
      	(debug_bitmap_elt_file): New function split out from ...
      	(debug_bitmap_file): ... here.  Handle splay tree bitmaps.
      	(bitmap_print): Likewise.
      
      	PR tree-optimization/63155
      	* tree-ssa-propagate.c (ssa_prop_init): Use tree-view for the
      	SSA edge worklists.
      	* tree-ssa-coalesce.c (coalesce_ssa_name): Populate used_in_copies
      	in tree-view.
      
      From-SVN: r265390
      Steven Bosscher committed
    • Index... · ddec5aea
      Index: gcc/config/rs6000/emmintrin.h
      ===================================================================
      --- gcc/config/rs6000/emmintrin.h	(revision 265318)
      +++ gcc/config/rs6000/emmintrin.h	(working copy)
      @@ -85,7 +85,7 @@ typedef double __m128d __attribute__ ((__vector_si
       typedef long long __m128i_u __attribute__ ((__vector_size__ (16), __may_alias__, __aligned__ (1)));
       typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, __aligned__ (1)));
       
      -/* Define two value permute mask */
      +/* Define two value permute mask.  */
       #define _MM_SHUFFLE2(x,y) (((x) << 1) | (y))
       
       /* Create a vector with element 0 as F and the rest zero.  */
      @@ -201,7 +201,7 @@ _mm_store_pd (double *__P, __m128d __A)
       extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
       _mm_storeu_pd (double *__P, __m128d __A)
       {
      -  *(__m128d *)__P = __A;
      +  *(__m128d_u *)__P = __A;
       }
       
       /* Stores the lower DPFP value.  */
      @@ -2175,7 +2175,7 @@ _mm_maskmoveu_si128 (__m128i __A, __m128i __B, cha
       {
         __v2du hibit = { 0x7f7f7f7f7f7f7f7fUL, 0x7f7f7f7f7f7f7f7fUL};
         __v16qu mask, tmp;
      -  __m128i *p = (__m128i*)__C;
      +  __m128i_u *p = (__m128i_u*)__C;
       
         tmp = (__v16qu)_mm_loadu_si128(p);
         mask = (__v16qu)vec_cmpgt ((__v16qu)__B, (__v16qu)hibit);
      Index: gcc/config/rs6000/xmmintrin.h
      ===================================================================
      --- gcc/config/rs6000/xmmintrin.h	(revision 265318)
      +++ gcc/config/rs6000/xmmintrin.h	(working copy)
      @@ -85,6 +85,10 @@
          vector types, and their scalar components.  */
       typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
       
      +/* Unaligned version of the same type.  */
      +typedef float __m128_u __attribute__ ((__vector_size__ (16), __may_alias__,
      +				       __aligned__ (1)));
      +
       /* Internal data types for implementing the intrinsics.  */
       typedef float __v4sf __attribute__ ((__vector_size__ (16)));
       
      @@ -172,7 +176,7 @@ _mm_store_ps (float *__P, __m128 __A)
       extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
       _mm_storeu_ps (float *__P, __m128 __A)
       {
      -  *(__m128 *)__P = __A;
      +  *(__m128_u *)__P = __A;
       }
       
       /* Store four SPFP values in reverse order.  The address must be aligned.  */
      
      From-SVN: r265389
      William Schmidt committed
    • Revert r263947. · d78bcb13
      2018-10-22  Martin Liska  <mliska@suse.cz>
      
        PR tree-optimization/87686
      	Revert
      	2018-08-29  Martin Liska  <mliska@suse.cz>
      
      	* tree-switch-conversion.c (switch_conversion::expand):
      	Strenghten assumption about gswitch statements.
      2018-10-22  Martin Liska  <mliska@suse.cz>
      
        PR tree-optimization/87686
      	* g++.dg/tree-ssa/pr87686.C: New test.
      
      From-SVN: r265388
      Martin Liska committed
    • Iterate -std=c++-* in i386.exp. · c7acc296
      2018-10-22  Jakub Jelinek  <jakub@redhat.com>
      
      	* g++.target/i386/i386.exp: Use g++-dg-runtest to iterate
      	properly -std= options.
      
      From-SVN: r265387
      Jakub Jelinek committed
    • Simplify comparison of attrs in IPA ICF. · 14762cd0
      2018-10-22  Martin Liska  <mliska@suse.cz>
      
      	* ipa-icf.c (sem_item::compare_attributes): Remove.
      	(sem_item::compare_referenced_symbol_properties): Use
      	attribute_list_equal instead.
      	(sem_function::equals_wpa): Likewise.
      	* ipa-icf.h: Remove compare_attributes.
      
      From-SVN: r265386
      Martin Liska committed
    • scop-4.c: Avoid out-of-bound access. · f79de13a
      2018-10-22  Richard Biener  <rguenther@suse.de>
      
      	* gcc.dg/graphite/scop-4.c: Avoid out-of-bound access.
      
      From-SVN: r265385
      Richard Biener committed
    • utils.c (unchecked_convert): Use local variables for the biased and reverse SSO… · 39c61276
      utils.c (unchecked_convert): Use local variables for the biased and reverse SSO attributes of both types.
      
      	* gcc-interface/utils.c (unchecked_convert): Use local variables for
      	the biased and reverse SSO attributes of both types.
      	Further extend the processing of integral types in the presence of
      	reverse SSO to all scalar types.
      
      From-SVN: r265381
      Eric Botcazou committed
    • trans.c (Pragma_to_gnu): Use a simple memory constraint in all cases. · 9e4cacfa
      	* gcc-interface/trans.c (Pragma_to_gnu) <Pragma_Inspection_Point>: Use
      	a simple memory constraint in all cases.
      
      	* gcc-interface/lang-specs.h: Bump copyright year.
      
      From-SVN: r265378
      Eric Botcazou committed
    • warn19.ad[sb]: New test. · bbc96027
      	* gnat.dg/warn19.ad[sb]: New test.
      	* gnat.dg/warn19_pkg.ads: New helper.
      
      From-SVN: r265377
      Eric Botcazou committed
    • re PR c/87682 (gcc/mem-stats.h:172: possible broken comparison operator ?) · 2c2f8674
      2018-10-22  Richard Biener  <rguenther@suse.de>
      
      	PR middle-end/87682
      	* mem-stats.h (mem_usage::operator==): Fix pasto.
      
      From-SVN: r265376
      Richard Biener committed
    • re PR bootstrap/87640 (internal compiler error: in check, at tree-vrp.c:155) · 893ade8b
      2018-10-22  Richard Biener  <rguenther@suse.de>
      
      	PR tree-optimization/87640
      	* tree-vrp.c (set_value_range_with_overflow): Decompose
      	incomplete result.
      	(extract_range_from_binary_expr_1): Adjust.
      
      	* gcc.dg/torture/pr87640.c: New testcase.
      
      From-SVN: r265375
      Richard Biener committed
    • S/390: Add the forgotten test for r265371 · 9470d3ec
      The test is part of the originally posted change
      (https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01173.html), but was
      forgotten during svn commit.
      
      From-SVN: r265373
      Ilya Leoshkevich committed
    • Add a fun parameter to three stmt_could_throw... functions · 36bbc05d
      This long patch only does one simple thing, adds an explicit function
      parameter to predicates stmt_could_throw_p, stmt_can_throw_external
      and stmt_can_throw_internal.
      
      My motivation was ability to use stmt_can_throw_external in IPA
      analysis phase without the need to push cfun.  As I have discovered,
      we were already doing that in cgraph.c, which this patch avoids as
      well.  In the process, I had to add a struct function parameter to
      stmt_could_throw_p and decided to also change the interface of
      stmt_can_throw_internal just for the sake of some minimal consistency.
      
      In the process I have discovered that calling method
      cgraph_node::create_version_clone_with_body (used by ipa-split,
      ipa-sra, OMP simd and multiple_target) leads to calls of
      stmt_can_throw_external with NULL cfun.  I have worked around this by
      making stmt_can_throw_external and stmt_could_throw_p gracefully
      accept NULL and just be pessimistic in that case.  The problem with
      fixing this in a better way is that struct function for the clone is
      created after cloning edges where we attempt to push the yet not
      existing cfun, and moving it before would require a bit of surgery in
      tree-inline.c.  A slightly hackish but simpler fix might be to
      explicitely pass the "old" function to symbol_table::create_edge
      because it should be just as good at that moment.  In any event, that
      is a topic for another patch.
      
      I believe that currently we incorrectly use cfun in
      maybe_clean_eh_stmt_fn and maybe_duplicate_eh_stmt_fn, both in
      tree-eh.c, and so I have fixed these cases too.  The bulk of other
      changes is just mechanical adding of cfun to all users.
      
      Bootstrapped and tested on x86_64-linux (also with extra NULLing and
      restoring cfun to double check it is not used in a place I missed), OK
      for trunk?
      
      Thanks,
      
      Martin
      
      2018-10-22  Martin Jambor  <mjambor@suse.cz>
      
      	* tree-eh.h (stmt_could_throw_p): Add function parameter.
      	(stmt_can_throw_external): Likewise.
      	(stmt_can_throw_internal): Likewise.
      	* tree-eh.c (lower_eh_constructs_2): Pass cfun to stmt_could_throw_p.
      	(lower_eh_constructs_2): Likewise.
      	(stmt_could_throw_p): Add fun parameter, use it instead of cfun.
      	(stmt_can_throw_external): Likewise.
      	(stmt_can_throw_internal): Likewise.
      	(maybe_clean_eh_stmt_fn): Pass cfun to stmt_could_throw_p.
      	(maybe_clean_or_replace_eh_stmt): Pass cfun to stmt_could_throw_p.
      	(maybe_duplicate_eh_stmt_fn): Pass new_fun to stmt_could_throw_p.
      	(maybe_duplicate_eh_stmt): Pass cfun to stmt_could_throw_p.
      	(pass_lower_eh_dispatch::execute): Pass cfun to
      	stmt_can_throw_external.
      	(cleanup_empty_eh): Likewise.
      	(verify_eh_edges): Pass cfun to stmt_could_throw_p.
      	* cgraph.c (cgraph_edge::set_call_stmt): Pass a function to
      	stmt_can_throw_external instead of pushing it to cfun.
      	(symbol_table::create_edge): Likewise.
      	* gimple-fold.c (fold_builtin_atomic_compare_exchange): Pass cfun to
      	stmt_can_throw_internal.
      	* gimple-ssa-evrp.c (evrp_dom_walker::before_dom_children): Pass cfun
      	to stmt_could_throw_p.
      	* gimple-ssa-store-merging.c (handled_load): Pass cfun to
      	stmt_can_throw_internal.
      	(pass_store_merging::execute): Likewise.
      	* gimple-ssa-strength-reduction.c
      	(find_candidates_dom_walker::before_dom_children): Pass cfun to
      	stmt_could_throw_p.
      	* gimplify-me.c (gimple_regimplify_operands): Pass cfun to
      	stmt_can_throw_internal.
      	* ipa-pure-const.c (check_call): Pass cfun to stmt_could_throw_p and
      	to stmt_can_throw_external.
      	(check_stmt): Pass cfun to stmt_could_throw_p.
      	(check_stmt): Pass cfun to stmt_can_throw_external.
      	(pass_nothrow::execute): Likewise.
      	* trans-mem.c (expand_call_tm): Pass cfun to stmt_can_throw_internal.
      	* tree-cfg.c (is_ctrl_altering_stmt): Pass cfun to
      	stmt_can_throw_internal.
      	(verify_gimple_in_cfg): Pass cfun to stmt_could_throw_p.
      	(stmt_can_terminate_bb_p): Pass cfun to stmt_can_throw_external.
      	(gimple_purge_dead_eh_edges): Pass cfun to stmt_can_throw_internal.
      	* tree-complex.c (expand_complex_libcall): Pass cfun to
      	stmt_could_throw_p and to stmt_can_throw_internal.
      	(expand_complex_multiplication): Pass cfun to stmt_can_throw_internal.
      	* tree-inline.c (copy_edges_for_bb): Likewise.
      	(maybe_move_debug_stmts_to_successors): Likewise.
      	* tree-outof-ssa.c (ssa_is_replaceable_p): Pass cfun to
      	stmt_could_throw_p.
      	* tree-parloops.c (oacc_entry_exit_ok_1): Likewise.
      	* tree-sra.c (scan_function): Pass cfun to stmt_can_throw_external.
      	* tree-ssa-alias.c (stmt_kills_ref_p): Pass cfun to
      	stmt_can_throw_internal.
      	* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Likewise.
      	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Pass cfun to
      	stmt_could_throw_p.
      	(mark_aliased_reaching_defs_necessary_1): Pass cfun to
      	stmt_can_throw_internal.
      	* tree-ssa-forwprop.c (pass_forwprop::execute): Likewise.
      	* tree-ssa-loop-im.c (movement_possibility): Pass cfun to
      	stmt_could_throw_p.
      	* tree-ssa-loop-ivopts.c (find_givs_in_stmt_scev): Likewise.
      	(add_autoinc_candidates): Pass cfun to stmt_can_throw_internal.
      	* tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Likewise.
      	(convert_mult_to_fma_1): Likewise.
      	(convert_to_divmod): Likewise.
      	* tree-ssa-phiprop.c (propagate_with_phi): Likewise.
      	* tree-ssa-pre.c (compute_avail): Pass cfun to stmt_could_throw_p.
      	* tree-ssa-propagate.c
      	(substitute_and_fold_dom_walker::before_dom_children): Likewise.
      	* tree-ssa-reassoc.c (suitable_cond_bb): Likewise.
      	(maybe_optimize_range_tests): Likewise.
      	(linearize_expr_tree): Likewise.
      	(reassociate_bb): Likewise.
      	* tree-ssa-sccvn.c (copy_reference_ops_from_call): Likewise.
      	* tree-ssa-scopedtables.c (hashable_expr_equal_p): Likewise.
      	* tree-ssa-strlen.c (adjust_last_stmt): Likewise.
      	(handle_char_store): Likewise.
      	* tree-vect-data-refs.c (vect_find_stmt_data_reference): Pass cfun to
      	stmt_can_throw_internal.
      	* tree-vect-patterns.c (check_bool_pattern): Pass cfun to
      	stmt_could_throw_p.
      	* tree-vect-stmts.c (vect_finish_stmt_generation_1): Likewise.
      	(vectorizable_call): Pass cfun to stmt_can_throw_internal.
      	(vectorizable_simd_clone_call): Likewise.
      	* value-prof.c (gimple_ic): Pass cfun to stmt_could_throw_p.
      	(gimple_stringop_fixed_value): Likewise.
      
      From-SVN: r265372
      Martin Jambor committed
    • S/390: Make "b" constraint match literal pool references · 3703b60c
      Improves the code generation by getting rid of redundant LAs, as seen
      in the following example:
      
      	-	la	%r1,0(%r13)
      	-	lg	%r4,0(%r1)
      	+	lg	%r4,0(%r13)
      
      Also allows to proceed with the merge of movdi_64 and movdi_larl.
      Currently LRA decides to spill literal pool references back to the
      literal pool, because it preliminarily chooses alternatives with
      CT_MEMORY constraints without calling
      satisfies_memory_constraint_p (). Later on it notices that the
      constraint is wrong and fixes it by spilling.  The constraint in this
      case is "b", and the operand is a literal pool reference.  There is
      no reason to reject them.  The current behavior was introduced,
      apparently unintentionally, by
      https://gcc.gnu.org/ml/gcc-patches/2010-09/msg00812.html
      
      The patch affects a little bit more than mentioned in the subject,
      because it changes s390_loadrelative_operand_p (), which is called not
      only for checking the "b" constraint.  However, the only caller for
      which it should really not accept literal pool references is
      s390_check_qrst_address (), so it was changed to explicitly do so.
      
      gcc/ChangeLog:
      
      2018-10-22  Ilya Leoshkevich  <iii@linux.ibm.com>
      
      	* config/s390/s390.c (s390_loadrelative_operand_p): Accept
      	literal pool references.
      	(s390_check_qrst_address): Adapt to the new behavior of
      	s390_loadrelative_operand_p ().
      
      gcc/testsuite/ChangeLog:
      
      2018-10-22  Ilya Leoshkevich  <iii@linux.ibm.com>
      
      	* gcc.target/s390/litpool-int.c: New test.
      
      From-SVN: r265371
      Ilya Leoshkevich committed
    • i386: Enable AVX512 memory broadcast for INT andnot · a48be73b
      Many AVX512 vector operations can broadcast from a scalar memory source.
      This patch enables memory broadcast for INT andnot operations.
      
      gcc/
      
      	PR target/72782
      	* config/i386/sse.md (*andnot<mode>3_bcst): New.
      
      gcc/testsuite/
      
      	PR target/72782
      	* gcc.target/i386/avx512f-andn-di-zmm-1.c: New test.
      	* gcc.target/i386/avx512f-andn-si-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-andn-si-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-andn-si-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-andn-si-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-andn-si-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512vl-andn-si-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-andn-si-ymm-1.c: Likewise.
      
      From-SVN: r265370
      H.J. Lu committed
    • i386: Enable AVX512 memory broadcast for INT logic · 0844e432
      Many AVX512 vector operations can broadcast from a scalar memory source.
      This patch enables memory broadcast for INT logic operations.
      
      gcc/
      
      	PR target/72782
      	* config/i386/sse.md (*<code><mode>3_bcst): New.
      
      gcc/testsuite/
      
      	PR target/72782
      	* gcc.target/i386/avx512f-and-di-zmm-1.c: New test.
      	* gcc.target/i386/avx512f-and-si-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-and-si-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-and-si-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-and-si-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-and-si-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512f-and-si-zmm-6.c: Likewise.
      	* gcc.target/i386/avx512f-or-di-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-or-si-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-or-si-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-or-si-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-or-si-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-or-si-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512f-or-si-zmm-6.c: Likewise.
      	* gcc.target/i386/avx512f-xor-di-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-xor-si-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-xor-si-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-xor-si-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-xor-si-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-xor-si-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512f-xor-si-zmm-6.c: Likewise.
      	* gcc.target/i386/avx512vl-and-si-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-and-si-ymm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-or-si-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-or-si-ymm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-xor-si-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-xor-si-ymm-1.c: Likewise.
      
      From-SVN: r265369
      H.J. Lu committed
    • i386: Enable AVX512 memory broadcast for INT add · 26d50717
      Many AVX512 vector operations can broadcast from a scalar memory source.
      This patch enables memory broadcast for INT add operations.
      
      gcc/
      
      	PR target/72782
      	* config/i386/sse.md (avx512bcst): Updated for V4SI, V2DI, V8SI,
      	V4DI, V16SI and V8DI.
      	(*sub<mode>3<mask_name>_bcst): New.
      	(*add<mode>3<mask_name>_bcst): Likewise.
      
      gcc/testsuite/
      
      	PR target/72782
      	* gcc.target/i386/avx512f-add-di-zmm-1.c: New test.
      	* gcc.target/i386/avx512f-add-si-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-add-si-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-add-si-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-add-si-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-add-si-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512f-add-si-zmm-6.c: Likewise.
      	* gcc.target/i386/avx512f-sub-di-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-sub-si-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-sub-si-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-sub-si-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-sub-si-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-sub-si-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512vl-add-si-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-add-si-ymm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-sub-si-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-sub-si-ymm-1.c: Likewise.
      
      From-SVN: r265368
      H.J. Lu committed
    • Daily bump. · 0067ddcc
      From-SVN: r265366
      GCC Administrator committed
  3. 21 Oct, 2018 6 commits
    • emmintrin.h (_mm_movemask_pd): Replace __vector __m64 with __vector unsigned… · 5d9c5a96
      emmintrin.h (_mm_movemask_pd): Replace __vector __m64 with __vector unsigned long long for compatibility.
      
      2018-10-21  Bill Schmidt  <wschmidt@linux.ibm.com>
      	    Jinsong Ji  <jji@us.ibm.com>
      
      	* config/rs6000/emmintrin.h (_mm_movemask_pd): Replace __vector
      	__m64 with __vector unsigned long long for compatibility.
      	(_mm_movemask_epi8): Likewise.
      	* config/rs6000/xmmintrin.h (_mm_cvtps_pi32): Likewise.
      	(_mm_cvttps_pi32): Likewise.
      	(_mm_cvtpi32_ps): Likewise.
      	(_mm_cvtps_pi16): Likewise.
      	(_mm_loadh_pi): Likewise.
      	(_mm_storeh_pi): Likewise.
      	(_mm_movehl_ps): Likewise.
      	(_mm_movelh_ps): Likewise.
      	(_mm_loadl_pi): Likewise.
      	(_mm_storel_pi): Likewise.
      	(_mm_movemask_ps): Likewise.
      	(_mm_shuffle_pi16): Likewise.
      
      From-SVN: r265362
      William Schmidt committed
    • Move testsuite ChangeLog entries to testsuite/ChangeLog · 9d165ca6
      From-SVN: r265360
      H.J. Lu committed
    • i386: Update AVX512 FMSUB/FNMADD/FNMSUB tests · 3be6195b
      Update AVX512 tests to test the newly added FMSUB, FNMADD and FNMSUB
      builtin functions.
      
      	PR target/72782
      	* gcc.target/i386/avx-1.c (__builtin_ia32_vfmsubpd512_mask): New.
      	(__builtin_ia32_vfmsubpd512_maskz): Likewise.
      	(__builtin_ia32_vfmsubps512_mask): Likewise.
      	(__builtin_ia32_vfmsubps512_maskz): Likewise.
      	(__builtin_ia32_vfnmaddpd512_mask3): Likewise.
      	(__builtin_ia32_vfnmaddpd512_maskz): Likewise.
      	(__builtin_ia32_vfnmaddps512_mask3): Likewise.
      	(__builtin_ia32_vfnmaddps512_maskz): Likewise.
      	(__builtin_ia32_vfnmsubpd512_maskz): Likewise.
      	(__builtin_ia32_vfnmsubps512_maskz): Likewise.
      	* testsuite/gcc.target/i386/sse-13.c
      	(__builtin_ia32_vfmsubpd512_mask): Likewise.
      	(__builtin_ia32_vfmsubpd512_maskz): Likewise.
      	(__builtin_ia32_vfmsubps512_mask): Likewise.
      	(__builtin_ia32_vfmsubps512_maskz): Likewise.
      	(__builtin_ia32_vfnmaddpd512_mask3): Likewise.
      	(__builtin_ia32_vfnmaddpd512_maskz): Likewise.
      	(__builtin_ia32_vfnmaddps512_mask3): Likewise.
      	(__builtin_ia32_vfnmaddps512_maskz): Likewise.
      	(__builtin_ia32_vfnmsubpd512_maskz): Likewise.
      	(__builtin_ia32_vfnmsubps512_maskz): Likewise.
      	* testsuite/gcc.target/i386/sse-23.c
      	(__builtin_ia32_vfmsubpd512_mask): Likewise.
      	(__builtin_ia32_vfmsubpd512_maskz): Likewise.
      	(__builtin_ia32_vfmsubps512_mask): Likewise.
      	(__builtin_ia32_vfmsubps512_maskz): Likewise.
      	(__builtin_ia32_vfnmaddpd512_mask3): Likewise.
      	(__builtin_ia32_vfnmaddpd512_maskz): Likewise.
      	(__builtin_ia32_vfnmaddps512_mask3): Likewise.
      	(__builtin_ia32_vfnmaddps512_maskz): Likewise.
      	(__builtin_ia32_vfnmsubpd512_maskz): Likewise.
      	(__builtin_ia32_vfnmsubps512_maskz): Likewise.
      
      From-SVN: r265359
      H.J. Lu committed
    • i386: Enable AVX512 memory broadcast for FNMSUB · 38ef6fb1
      Many AVX512 vector operations can broadcast from a scalar memory source.
      This patch enables memory broadcast for FNMSUB operations.  In order to
      support AVX512 memory broadcast for FNMSUB, FNMSUB builtin functions are
      also added, instead of passing the negated value to FMA builtin functions.
      
      gcc/
      
      	PR target/72782
      	* config/i386/avx512fintrin.h (_mm512_fnmsub_round_pd): Use
      	__builtin_ia32_vfnmsubpd512_mask.
      	(_mm512_mask_fnmsub_round_pd): Likewise.
      	(_mm512_fnmsub_pd): Likewise.
      	(_mm512_mask_fnmsub_pd): Likewise.
      	(_mm512_maskz_fnmsub_round_pd): Use
      	__builtin_ia32_vfnmsubpd512_maskz.
      	(_mm512_maskz_fnmsub_pd): Likewise.
      	(_mm512_fnmsub_round_ps): Use __builtin_ia32_vfnmsubps512_mask.
      	(_mm512_mask_fnmsub_round_ps): Likewise.
      	(_mm512_fnmsub_ps): Likewise.
      	(_mm512_mask_fnmsub_ps): Likewise.
      	(_mm512_maskz_fnmsub_round_ps): Use
      	__builtin_ia32_vfnmsubps512_maskz.
      	(_mm512_maskz_fnmsub_ps): Likewise.
      	* config/i386/avx512vlintrin.h (_mm256_mask_fnmsub_pd): Use
      	__builtin_ia32_vfnmsubpd256_mask.
      	(_mm256_maskz_fnmsub_pd): Use __builtin_ia32_vfnmsubpd256_maskz.
      	(_mm_mask_fnmsub_pd): Use __builtin_ia32_vfmaddpd128_mask
      	(_mm_maskz_fnmsub_pd): Use __builtin_ia32_vfnmsubpd128_maskz.
      	(_mm256_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_mask.
      	(_mm256_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_mask.
      	(_mm256_maskz_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_maskz.
      	(_mm_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps128_mask.
      	(_mm_maskz_fnmsub_ps): Use __builtin_ia32_vfnmsubps128_maskz.
      	* config/i386/fmaintrin.h (_mm_fnmsub_pd): Use
      	__builtin_ia32_vfnmsubpd.
      	(_mm256_fnmsub_pd): Use __builtin_ia32_vfnmsubpd256.
      	(_mm_fnmsub_ps): Use __builtin_ia32_vfnmsubps.
      	(_mm256_fnmsub_ps): Use __builtin_ia32_vfnmsubps256.
      	(_mm_fnmsub_sd): Use __builtin_ia32_vfnmsubsd3.
      	(_mm_fnmsub_ss): Use __builtin_ia32_vfnmsubss3.
      	* config/i386/i386-builtin.def: Add
      	__builtin_ia32_vfnmsubpd256_mask,
      	__builtin_ia32_vfnmsubpd256_maskz,
      	__builtin_ia32_vfnmsubpd128_mask,
      	__builtin_ia32_vfnmsubpd128_maskz,
      	__builtin_ia32_vfnmsubps256_mask,
      	__builtin_ia32_vfnmsubps256_maskz,
      	__builtin_ia32_vfnmsubps128_mask,
      	__builtin_ia32_vfnmsubps128_maskz,
      	__builtin_ia32_vfnmsubpd512_mask,
      	__builtin_ia32_vfnmsubpd512_maskz,
      	__builtin_ia32_vfnmsubps512_mask,
      	__builtin_ia32_vfnmsubps512_maskz, __builtin_ia32_vfnmsubss3,
      	__builtin_ia32_vfnmsubsd3, __builtin_ia32_vfnmsubps,
      	__builtin_ia32_vfnmsubpd, __builtin_ia32_vfnmsubps256 and.
      	__builtin_ia32_vfnmsubpd256.
      	* config/i386/sse.md (fma4i_fnmsub_<mode>): New.
      	(<avx512>_fnmsub_<mode>_maskz<round_expand_name>): Likewise.
      	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1):
      	Likewise.
      	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2):
      	Likewise.
      	(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3):
      	Likewise.
      	(fmai_vmfnmsub_<mode><round_name>): Likewise.
      
      gcc/testsuite/
      
      	PR target/72782
      	* gcc.target/i386/avx512f-fnmsub-df-zmm-1.c: New test.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c: Likewise.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Likewise.
      	* gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c: Likewise.
      	* gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c: Likewise.
      
      From-SVN: r265358
      H.J. Lu committed
    • i386: Enable AVX512 memory broadcast for FNMADD · 5ca94977
      Many AVX512 vector operations can broadcast from a scalar memory source.
      This patch enables memory broadcast for FNMADD operations.  In order to
      support AVX512 memory broadcast for FNMADD, FNMADD builtin functions are
      also added, instead of passing the negated value to FMA builtin functions.
      
      gcc/
      
      	PR target/72782
      	* config/i386/avx512fintrin.h (_mm512_fnmadd_round_pd): Use
      	__builtin_ia32_vfnmaddpd512_mask.
      	(_mm512_mask_fnmadd_round_pd): Likewise.
      	(_mm512_fnmadd_pd): Likewise.
      	(_mm512_mask_fnmadd_pd): Likewise.
      	(_mm512_maskz_fnmadd_round_pd): Use
      	__builtin_ia32_vfnmaddpd512_maskz.
      	(_mm512_maskz_fnmadd_pd): Likewise.
      	(_mm512_fnmadd_round_ps): Use __builtin_ia32_vfnmaddps512_mask.
      	(_mm512_mask_fnmadd_round_ps): Likewise.
      	(_mm512_fnmadd_ps): Likewise.
      	(_mm512_mask_fnmadd_ps): Likewise.
      	(_mm512_maskz_fnmadd_round_ps): Use
      	__builtin_ia32_vfnmaddps512_maskz.
      	(_mm512_maskz_fnmadd_ps): Likewise.
      	* config/i386/avx512vlintrin.h (_mm256_mask_fnmadd_pd): Use
      	__builtin_ia32_vfnmaddpd256_mask.
      	(_mm256_maskz_fnmadd_pd): Use __builtin_ia32_vfnmaddpd256_maskz.
      	(_mm_mask_fnmadd_pd): Use __builtin_ia32_vfmaddpd128_mask
      	(_mm_maskz_fnmadd_pd): Use __builtin_ia32_vfnmaddpd128_maskz.
      	(_mm256_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_mask.
      	(_mm256_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_mask.
      	(_mm256_maskz_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_maskz.
      	(_mm_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps128_mask.
      	(_mm_maskz_fnmadd_ps): Use __builtin_ia32_vfnmaddps128_maskz.
      	* config/i386/fmaintrin.h (_mm_fnmadd_pd): Use
      	__builtin_ia32_vfnmaddpd.
      	(_mm256_fnmadd_pd): Use __builtin_ia32_vfnmaddpd256.
      	(_mm_fnmadd_ps): Use __builtin_ia32_vfnmaddps.
      	(_mm256_fnmadd_ps): Use __builtin_ia32_vfnmaddps256.
      	(_mm_fnmadd_sd): Use __builtin_ia32_vfnmaddsd3.
      	(_mm_fnmadd_ss): Use __builtin_ia32_vfnmaddss3.
      	* config/i386/i386-builtin.def: Add
      	__builtin_ia32_vfnmaddpd256_mask,
      	__builtin_ia32_vfnmaddpd256_maskz,
      	__builtin_ia32_vfnmaddpd128_mask,
      	__builtin_ia32_vfnmaddpd128_maskz,
      	__builtin_ia32_vfnmaddps256_mask,
      	__builtin_ia32_vfnmaddps256_maskz,
      	__builtin_ia32_vfnmaddps128_mask,
      	__builtin_ia32_vfnmaddps128_maskz,
      	__builtin_ia32_vfnmaddpd512_mask,
      	__builtin_ia32_vfnmaddpd512_maskz,
      	__builtin_ia32_vfnmaddps512_mask,
      	__builtin_ia32_vfnmaddps512_maskz, __builtin_ia32_vfnmaddss3,
      	__builtin_ia32_vfnmaddsd3, __builtin_ia32_vfnmaddps,
      	__builtin_ia32_vfnmaddpd, __builtin_ia32_vfnmaddps256 and.
      	__builtin_ia32_vfnmaddpd256.
      	* config/i386/sse.md (fma4i_fnmadd_<mode>): New.
      	(<avx512>_fnmadd_<mode>_maskz<round_expand_name>): Likewise.
      	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1):
      	Likewise.
      	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2):
      	Likewise.
      	(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3):
      	Likewise.
      	(fmai_vmfnmadd_<mode><round_name>): Likewise.
      
      gcc/testsuite/
      
      	PR target/72782
      	* gcc.target/i386/avx512f-fnmadd-df-zmm-1.c: New test.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c: Likewise.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Likewise.
      	* gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c: Likewise.
      	* gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c: Likewise.
      
      From-SVN: r265357
      H.J. Lu committed
    • Enable AVX512 memory broadcast for FMSUB · fe7f972d
      Many AVX512 vector operations can broadcast from a scalar memory source.
      This patch enables memory broadcast for FMSUB operations.  In order to
      support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are
      also added, instead of passing the negated value to FMA builtin functions.
      
      gcc/
      
      	PR target/72782
      	* config/i386/avx512fintrin.h (_mm512_fmsub_round_pd): Use
      	__builtin_ia32_vfmsubpd512_mask.
      	(_mm512_mask_fmsub_round_pd): Likewise.
      	(_mm512_fmsub_pd): Likewise.
      	(_mm512_mask_fmsub_pd): Likewise.
      	(_mm512_maskz_fmsub_round_pd): Use
      	__builtin_ia32_vfmsubpd512_maskz.
      	(_mm512_maskz_fmsub_pd): Likewise.
      	(_mm512_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_mask.
      	(_mm512_mask_fmsub_round_ps): Likewise.
      	(_mm512_fmsub_ps): Likewise.
      	(_mm512_mask_fmsub_ps): Likewise.
      	(_mm512_maskz_fmsub_round_ps): Use
      	__builtin_ia32_vfmsubps512_maskz.
      	(_mm512_maskz_fmsub_ps): Likewise.
      	* config/i386/avx512vlintrin.h (_mm256_mask_fmsub_pd): Use
      	__builtin_ia32_vfmsubpd256_mask.
      	(_mm256_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd256_maskz.
      	(_mm_mask_fmsub_pd): Use __builtin_ia32_vfmaddpd128_mask
      	(_mm_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd128_maskz.
      	(_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
      	(_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask.
      	(_mm256_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps256_maskz.
      	(_mm_mask_fmsub_ps): Use __builtin_ia32_vfmsubps128_mask.
      	(_mm_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps128_maskz.
      	* config/i386/fmaintrin.h (_mm_fmsub_pd): Use
      	__builtin_ia32_vfmsubpd.
      	(_mm256_fmsub_pd): Use __builtin_ia32_vfmsubpd256.
      	(_mm_fmsub_ps): Use __builtin_ia32_vfmsubps.
      	(_mm256_fmsub_ps): Use __builtin_ia32_vfmsubps256.
      	(_mm_fmsub_sd): Use __builtin_ia32_vfmsubsd3.
      	(_mm_fmsub_ss): Use __builtin_ia32_vfmsubss3.
      	* config/i386/i386-builtin.def: Add
      	__builtin_ia32_vfmsubpd256_mask,
      	__builtin_ia32_vfmsubpd256_maskz,
      	__builtin_ia32_vfmsubpd128_mask,
      	__builtin_ia32_vfmsubpd128_maskz,
      	__builtin_ia32_vfmsubps256_mask,
      	__builtin_ia32_vfmsubps256_maskz,
      	__builtin_ia32_vfmsubps128_mask,
      	__builtin_ia32_vfmsubps128_maskz,
      	__builtin_ia32_vfmsubpd512_mask,
      	__builtin_ia32_vfmsubpd512_maskz,
      	__builtin_ia32_vfmsubps512_mask,
      	__builtin_ia32_vfmsubps512_maskz, __builtin_ia32_vfmsubss3,
      	__builtin_ia32_vfmsubsd3, __builtin_ia32_vfmsubps,
      	__builtin_ia32_vfmsubpd, __builtin_ia32_vfmsubps256 and.
      	__builtin_ia32_vfmsubpd256.
      	* config/i386/sse.md (fma4i_fmsub_<mode>): New.
      	(<avx512>_fmsub_<mode>_maskz<round_expand_name>): Likewise.
      	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1):
      	Likewise.
      	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2):
      	Likewise.
      	(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3):
      	Likewise.
      	(fmai_vmfmsub_<mode><round_name>): Likewise.
      
      gcc/testsuite/
      
      	PR target/72782
      	* gcc.target/i386/avx512f-fmsub-df-zmm-1.c: New test.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-1.c: Likewise.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-2.c: Likewise.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-3.c: Likewise.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-4.c: Likewise.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-5.c: Likewise.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-6.c: Likewise.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Likewise.
      	* gcc.target/i386/avx512f-fmsub-sf-zmm-8.c: Likewise.
      	* gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c: Likewise.
      	* gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c: Likewise.
      
      From-SVN: r265356
      H.J. Lu committed