- 18 May, 2018 2 commits
-
-
There are four optabs for various forms of fused multiply-add: fma, fms, fnma and fnms. Of these, only fma had a direct gimple representation. For the other three we relied on special pattern- matching during expand, although tree-ssa-math-opts.c did have some code to try to second-guess what expand would do. This patch removes the old FMA_EXPR representation of fma and introduces four new internal functions, one for each optab. IFN_FMA is tied to BUILT_IN_FMA* while the other three are independent directly-mapped internal functions. It's then possible to do the pattern-matching in match.pd and tree-ssa-math-opts.c (via folding) can select the exact FMA-based operation. The BRIG & HSA parts are a best guess, but seem relatively simple. 2018-05-18 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (scalar_all_fma): Document. * tree.def (FMA_EXPR): Delete. * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. * internal-fn.c (ternary_direct): New macro. (expand_ternary_optab_fn): Likewise. (direct_ternary_optab_supported_p): Likewise. * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. * builtins.c (fold_builtin_fma): Delete. (fold_builtin_3): Don't call it. * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (operand_equal_p): Likewise. (fold_ternary_loc): Likewise. * gimple-pretty-print.c (dump_ternary_rhs): Likewise. * gimple.c (DEFTREECODE): Likewise. * gimplify.c (gimplify_expr): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-eh.c (operation_could_trap_p): Likewise. (stmt_could_throw_1_p): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. (op_code_prio): Likewise. * tree-ssa-loop-im.c (stmt_cost): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree.c (commutative_ternary_tree_code, add_expr): Likewise. * fold-const-call.h (fold_fma): Delete. * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (fold_fma): Delete. * genmatch.c (combined_fn): New enum. (commutative_ternary_tree_code): Remove FMA_EXPR handling. (commutative_op): New function. (commutate): Use it. Handle more than 2 operands. (dt_operand::gen_gimple_expr): Use commutative_op. (parser::parse_expr): Allow :c to be used with non-binary operators if the commutative operand is known. * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (backprop::process_assign_use): Remove FMA_EXPR handling. * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. (gen_hsa_fma): New function. (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, IFN_FNMA and IFN_FNMS. * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. * gimple-fold.h (follow_all_ssa_edges): Declare. * gimple-fold.c (follow_all_ssa_edges): New function. * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the gimple_build interface and use follow_all_ssa_edges to fold the result. (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p instead of checking for optabs directly. * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls rather than FMA_EXPRs. * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a call to IFN_FMA instead of an FMA_EXPR. gcc/brig/ * brigfrontend/brig-function.cc (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA for BRIG_OPCODE_FMA. (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA as a call. gcc/c/ * gimple-parser.c (c_parser_gimple_postfix_expression): Remove __FMA_EXPR handlng. gcc/cp/ * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. (potential_constant_expression_1): Likewise. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_scalar_all_fma): New proc. * gcc.dg/fma-1.c: New test. * gcc.dg/fma-2.c: Likewise. * gcc.dg/fma-3.c: Likewise. * gcc.dg/fma-4.c: Likewise. * gcc.dg/fma-5.c: Likewise. * gcc.dg/fma-6.c: Likewise. * gcc.dg/fma-7.c: Likewise. * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require scalar_all_fma. * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. * gfortran.dg/reassoc_8.f: Likewise. * gfortran.dg/reassoc_9.f: Likewise. * gfortran.dg/reassoc_10.f: Likewise. From-SVN: r260348
Richard Sandiford committed -
From-SVN: r260347
GCC Administrator committed
-
- 17 May, 2018 22 commits
-
-
* line-map.c (linemap_init): Use placement new. * system.h: #include <new>. From-SVN: r260343
Jason Merrill committed -
gcc/ * expr.c (do_tablejump): When converting index to Pmode, if we have a sign extended promoted subreg, and the range does not have the sign bit set, then do a sign extend. * config/riscv/riscv.c (riscv_extend_comparands): In unsigned QImode test, check for sign extended subreg and/or constant operands, and do a sign extend in that case. gcc/testsuite/ * gcc.target/riscv/switch-qi.c: New. * gcc.target/riscv/switch-si.c: New. From-SVN: r260340
Jim Wilson committed -
2018-05-17 Steve Ellcey <sellcey@cavium.com> * config/aarch64/thunderx2t99.md (thunderx2t99_ls_both): Delete. (thunderx2t99_multiple): Delete psuedo-units from used cpus. Add untyped. (thunderx2t99_alu_shift): Remove alu_shift_reg, alus_shift_reg. Change logics_shift_reg to logics_shift_imm. (thunderx2t99_fp_loadpair_basic): Delete. (thunderx2t99_fp_storepair_basic): Delete. (thunderx2t99_asimd_int): Add neon_sub and neon_sub_q types. (thunderx2t99_asimd_polynomial): Delete. (thunderx2t99_asimd_fp_simple): Add neon_fp_mul_s_scalar_q and neon_fp_mul_d_scalar_q. (thunderx2t99_asimd_fp_conv): Add *int_to_fp* types. (thunderx2t99_asimd_misc): Delete neon_dup and neon_dup_q. (thunderx2t99_asimd_recip_step): Add missing *sqrt* types. (thunderx2t99_asimd_lut): Add missing tbl types. (thunderx2t99_asimd_ext): Delete. (thunderx2t99_asimd_load1_1_mult): Delete. (thunderx2t99_asimd_load1_2_mult): Delete. (thunderx2t99_asimd_load1_ldp): New. (thunderx2t99_asimd_load1): New. (thunderx2t99_asimd_load2): Add missing *load2* types. (thunderx2t99_asimd_load3): New. (thunderx2t99_asimd_load4): New. (thunderx2t99_asimd_store1_1_mult): Delete. (thunderx2t99_asimd_store1_2_mult): Delete. (thunderx2t99_asimd_store2_mult): Delete. (thunderx2t99_asimd_store2_onelane): Delete. (thunderx2t99_asimd_store_stp): New. (thunderx2t99_asimd_store1): New. (thunderx2t99_asimd_store2): New. (thunderx2t99_asimd_store3): New. (thunderx2t99_asimd_store4): New. From-SVN: r260335
Steve Ellcey committed -
2018-05-17 Jerome Lambourg <lambourg@adacore.com> gcc/ * config/arm/arm_cmse.h (cmse_nsfptr_create, cmse_is_nsfptr): Remove #include <stdint.h>. Replace intptr_t with __INTPTR_TYPE__. libgcc/ * config/arm/cmse.c (cmse_check_address_range): Replace UINTPTR_MAX with __UINTPTR_MAX__ and uintptr_t with __UINTPTR_TYPE__. From-SVN: r260330
Jerome Lambourg committed -
PR target/85698 * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest operand. * gcc.target/powerpc/pr85698.c: New test. Co-Authored-By: Segher Boessenkool <segher@kernel.crashing.org> From-SVN: r260329
Pat Haugen committed -
Because path.cc is compiled with -std=gnu++17 the static constexpr data member is implicitly 'inline' and so no definition gets emitted unless it gets used in that translation unit. Other translation units built as C++11 or C++14 still require a namespace-scope definition of the variable, so mark the definition as used. PR libstdc++/85818 * src/filesystem/path.cc (path::preferred_separator): Add used attribute. * testsuite/experimental/filesystem/path/preferred_separator.cc: New. From-SVN: r260326
Jonathan Wakely committed -
PR libstdc++/85812 * libsupc++/cxxabi_init_exception.h (__cxa_free_exception): Declare. * libsupc++/exception_ptr.h (make_exception_ptr) [__cpp_exceptions]: Refactor to separate non-throwing and throwing implementations. [__cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI]: Deallocate the memory if constructing the object throws. From-SVN: r260323
Jonathan Wakely committed -
tree-ssa-dse.c (dse_classify_store): Fix iterator increment for pruning loop and prune defs feeding only already... 2018-05-17 Richard Biener <rguenther@suse.de> * tree-ssa-dse.c (dse_classify_store): Fix iterator increment for pruning loop and prune defs feeding only already visited PHIs. From-SVN: r260322
Richard Biener committed -
2018-05-17 Richard Biener <rguenther@suse.de> * tree-ssa-sccvn.c (vn_reference_lookup_3): Improve memset handling. * gcc.dg/tree-ssa/ssa-fre-63.c: New testcase. From-SVN: r260318
Richard Biener committed -
PR tree-optimization/85793 * tree-vect-stmts.c (vectorizable_load): Handle 1 element-wise load for VMAT_ELEMENTWISE. gcc/testsuite * gcc.dg/vect/pr85793.c: New test. Co-Authored-By: Richard Biener <rguenther@suse.de> From-SVN: r260317
Bin Cheng committed -
This patch gets the gimple FE to parse calls to internal functions. The only non-obvious thing was how the functions should be written to avoid clashes with real function names. One option would be to go the magic number of underscores route, but we already do that for built-in functions, and it would be good to keep them visually distinct. In the end I borrowed the local/internal label convention from asm and used: x = .SQRT (y); 2018-05-17 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * internal-fn.h (lookup_internal_fn): Declare * internal-fn.c (lookup_internal_fn): New function. * gimple.c (gimple_build_call_from_tree): Handle calls to internal functions. * gimple-pretty-print.c (dump_gimple_call): Print "." before internal function names. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-ssa-scopedtables.c (expr_hash_elt::print): Likewise. gcc/c/ * gimple-parser.c: Include internal-fn.h. (c_parser_gimple_statement): Treat a leading CPP_DOT as a call. (c_parser_gimple_call_internal): New function. (c_parser_gimple_postfix_expression): Use it to handle CPP_DOT. Fix typos in comment. gcc/testsuite/ * gcc.dg/gimplefe-28.c: New test. * gcc.dg/asan/use-after-scope-9.c: Adjust expected output for internal function calls. * gcc.dg/goacc/loop-processing-1.c: Likewise. From-SVN: r260316
Richard Sandiford committed -
This patch makes the function versions of gimple_build and gimple_simplify take combined_fns rather than built_in_codes, so that they work with internal functions too. The old gimple_builds were unused, so no existing callers need to be updated. 2018-05-17 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * gimple-fold.h (gimple_build): Make the function forms take combined_fn rather than built_in_function. (gimple_simplify): Likewise. * gimple-match-head.c (gimple_simplify): Likewise. * gimple-fold.c (gimple_build): Likewise. * tree-vect-loop.c (get_initial_def_for_reduction): Use gimple_build rather than gimple_build_call_internal. (get_initial_defs_for_reduction): Likewise. (vect_create_epilog_for_reduction): Likewise. (vectorizable_live_operation): Likewise. From-SVN: r260315
Richard Sandiford committed -
2018-05-17 Martin Liska <mliska@suse.cz> * gimple-ssa-sprintf.c (format_directive): Do not use space in between 'G_' and '('. 2018-05-17 Martin Liska <mliska@suse.cz> * c-warn.c (overflow_warning): Do not use space in between 'G_' and '('. 2018-05-17 Martin Liska <mliska@suse.cz> * gcc.dg/plugin/ggcplug.c (plugin_init): Do not use space in between 'G_' and '('. From-SVN: r260314
Martin Liska committed -
PR target/85323 * config/i386/i386.c (ix86_fold_builtin): Handle masked shifts even if the mask is not all ones. * gcc.target/i386/pr85323-7.c: New test. * gcc.target/i386/pr85323-8.c: New test. * gcc.target/i386/pr85323-9.c: New test. From-SVN: r260313
Jakub Jelinek committed -
PR target/85323 * config/i386/i386.c (ix86_fold_builtin): Fold shift builtins by vector. (ix86_gimple_fold_builtin): Likewise. * gcc.target/i386/pr85323-4.c: New test. * gcc.target/i386/pr85323-5.c: New test. * gcc.target/i386/pr85323-6.c: New test. From-SVN: r260312
Jakub Jelinek committed -
PR target/85323 * config/i386/i386.c: Include tree-vector-builder.h. (ix86_vector_shift_count): New function. (ix86_fold_builtin): Fold shift builtins by scalar count. (ix86_gimple_fold_builtin): Likewise. * gcc.target/i386/pr85323-1.c: New test. * gcc.target/i386/pr85323-2.c: New test. * gcc.target/i386/pr85323-3.c: New test. From-SVN: r260311
Jakub Jelinek committed -
* config/i386/avx512fintrin.h (_mm512_set_epi16, _mm512_set_epi8, _mm512_setzero): New intrinsics. * gcc.target/i386/avx512f-set-v32hi-1.c: New test. * gcc.target/i386/avx512f-set-v32hi-2.c: New test. * gcc.target/i386/avx512f-set-v32hi-3.c: New test. * gcc.target/i386/avx512f-set-v32hi-4.c: New test. * gcc.target/i386/avx512f-set-v32hi-5.c: New test. * gcc.target/i386/avx512f-set-v64qi-1.c: New test. * gcc.target/i386/avx512f-set-v64qi-2.c: New test. * gcc.target/i386/avx512f-set-v64qi-3.c: New test. * gcc.target/i386/avx512f-set-v64qi-4.c: New test. * gcc.target/i386/avx512f-set-v64qi-5.c: New test. * gcc.target/i386/avx512f-setzero-1.c: New test. From-SVN: r260310
Jakub Jelinek committed -
In the testcase in this patch we create an SLP vector with only two elements. Our current vector initialisation code will first duplicate the first element to both lanes, then overwrite the top lane with a new value. This duplication can be clunky and wasteful. Better would be to simply use the fact that we will always be overwriting the remaining bits, and simply move the first element to the corrcet place (implicitly zeroing all other bits). This reduces the code generation for this case, and can allow more efficient addressing modes, and other second order benefits for AArch64 code which has been vectorized to V2DI mode. Note that the change is generic enough to catch the case for any vector mode, but is expected to be most useful for 2x64-bit vectorization. Unfortunately, on its own, this would cause failures in gcc.target/aarch64/load_v2vec_lanes_1.c and gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more vec_merge and vec_duplicate for their simplifications to apply. To fix this, add a special case to the AArch64 code if we are loading from two memory addresses, and use the load_pair_lanes patterns directly. We also need a new pattern in simplify-rtx.c:simplify_ternary_operation to catch: (vec_merge:OUTER (vec_duplicate:OUTER x:INNER) (subreg:OUTER y:INNER 0) (const_int N)) And simplify it to: (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x) This is similar to the existing patterns which are tested in this function, without requiring the second operand to also be a vec_duplicate. * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code generation for cases where splatting a value is not useful. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge across a vec_duplicate and a paradoxical subreg forming a vector mode to a vec_concat. * gcc.target/aarch64/vect-slp-dup.c: New. Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com> From-SVN: r260309
James Greenhalgh committed -
2018-05-17 Paolo Carlini <paolo.carlini@oracle.com> PR c++/85713 * g++.dg/cpp1y/lambda-generic-85713-2.C: New. From-SVN: r260308
Paolo Carlini committed -
2018-05-17 Olga Makhotina <olga.makhotina@intel.com> gcc/ * config.gcc: Support "goldmont-plus". * config/i386/driver-i386.c (host_detect_local_cpu): Detect "goldmont-plus". * config/i386/i386-c.c (ix86_target_macros_internal): Handle PROCESSOR_GOLDMONT_PLUS. * config/i386/i386.c (m_GOLDMONT_PLUS): Define. (processor_target_table): Add "goldmont-plus". (PTA_GOLDMONT_PLUS): Define. (ix86_lea_outperforms): Add TARGET_GOLDMONT_PLUS. (get_builtin_code_for_version): Handle PROCESSOR_GOLDMONT_PLUS. (fold_builtin_cpu): Add M_INTEL_GOLDMONT_PLUS. (fold_builtin_cpu): Add "goldmont-plus". (ix86_add_stmt_cost): Add TARGET_GOLDMONT_PLUS. (ix86_option_override_internal): Add "goldmont-plus". * config/i386/i386.h (processor_costs): Define TARGET_GOLDMONT_PLUS. (processor_type): Add PROCESSOR_GOLDMONT_PLUS. * config/i386/x86-tune.def: Add m_GOLDMONT_PLUS. * doc/invoke.texi: Add goldmont-plus as x86 -march=/-mtune= CPU type. libgcc/ * config/i386/cpuinfo.h (processor_types): Add INTEL_GOLDMONT_PLUS. * config/i386/cpuinfo.c (get_intel_cpu): Detect Goldmont Plus. gcc/testsuite/ * gcc.target/i386/builtin_target.c: Test goldmont-plus. * gcc.target/i386/funcspec-56.inc: Test arch=goldmont-plus. From-SVN: r260307
Olga Makhotina committed -
2018-05-17 Richard Biener <rguenther@suse.de> PR tree-optimization/85757 * tree-ssa-dse.c (dse_classify_store): Record a PHI def and remove defs that only feed that PHI from further processing. * gcc.dg/tree-ssa/ssa-dse-34.c: New testcase. From-SVN: r260306
Richard Biener committed -
From-SVN: r260304
GCC Administrator committed
-
- 16 May, 2018 16 commits
-
-
re PR c++/85363 (Throwing exception from member constructor (brace initializer vs initializer list)) PR c++/85363 * call.c (set_flags_from_callee): Handle AGGR_INIT_EXPRs too. * tree.c (bot_manip): Call set_flags_from_callee for AGGR_INIT_EXPRs too. * g++.dg/cpp0x/initlist-throw1.C: New test. * g++.dg/cpp0x/initlist-throw2.C: New test. From-SVN: r260300
Marek Polacek committed -
gcc/ * config/riscv/riscv.md (<optab>si3_mask, <optab>si3_mask_1): Prepend asterisk to name. (<optab>di3_mask, <optab>di3_mask_1): Likewise. From-SVN: r260299
Jim Wilson committed -
DWARF5 defines a small header for .debug_str_offsets. Since we only use it for split dwarf .dwo files we don't need to keep track of the actual index offset in an attribute. gcc/ChangeLog * dwarf2out.c (count_index_strings): New function. (output_indirect_strings): Call count_index_strings and generate header for dwarf_version >= 5. From-SVN: r260298
Mark Wielaard committed -
We already emit DWARF5 attributes and tables for indirect addresses and string offsets, but still use GNU forms. Add a new helper function dwarf_FORM () for emitting the right form. Currently we only use the uleb128 forms. But DWARF5 also allows 1, 2, 3 and 4 byte forms (DW_FORM_strx[1234] and DW_FORM_addrx[1234]) which might be more space efficient. gcc/ChangeLog * dwarf2out.c (dwarf_FORM): New function. (set_indirect_string): Use dwarf_FORM. (reset_indirect_string): Likewise. (size_of_die): Likewise. (value_format): Likewise. (output_die): Likewise. (add_skeleton_AT_string): Likewise. (output_macinfo_op): Likewise. (index_string): Likewise. (output_index_string_offset): Likewise. (output_index_string): Likewise. From-SVN: r260297
Mark Wielaard committed -
gcc/ChangeLog: 2018-05-16 Carl Love <cel@us.ibm.com> * config/rs6000/rs6000.md (prefetch): Generate ISA 2.06 instructions dcbt and dcbtstt with TH=16 if operands[2] is 0 and Power 8 or newer. From-SVN: r260296
Carl Love committed -
2018-05-16 Martin Jambor <mjambor@suse.cz> * ipa-prop.c (ipa_free_all_edge_args): Remove. * ipa-prop.h (ipa_free_all_edge_args): Likewise. From-SVN: r260295
Martin Jambor committed -
gcc/testsuite/ChangeLog: 2018-05-16 Carl Love <cel@us.ibm.com> * gcc.target/powerpc/vsx-vector-6-be.c: Remove file. * gcc.target/powerpc/vsx-vector-6-be.p7.c: New test file. * gcc.target/powerpc/vsx-vector-6-be.p8.c: New test file. * gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update counts for xvcmpeqdp., xvcmpgtdp., xvcmpgedp., xxlxor, xvrdpi. From-SVN: r260294
Carl Love committed -
This patch improves register allocation of fma by preferring to update the accumulator register. This is done by adding fma insns with operand 1 as the accumulator. The register allocator considers copy preferences only in operand order, so if the first operand is dead, it has the highest chance of being reused as the destination. As a result code using fma often has a better register allocation. Performance of SPECFP2017 improves by over 0.5% on some implementations, while it had no effect on other implementations. Fma is more readable too, in a simple example we now generate: fmadd s16, s2, s1, s16 fmadd s7, s17, s16, s7 fmadd s6, s16, s7, s6 fmadd s5, s7, s6, s5 instead of: fmadd s16, s16, s2, s1 fmadd s7, s7, s16, s6 fmadd s6, s6, s7, s5 fmadd s5, s5, s6, s4 gcc/ * config/aarch64/aarch64.md (fma<mode>4): Change into expand pattern. (fnma<mode>4): Likewise. (fms<mode>4): Likewise. (fnms<mode>4): Likewise. (aarch64_fma<mode>4): Rename insn, reorder accumulator operand. (aarch64_fnma<mode>4): Likewise. (aarch64_fms<mode>4): Likewise. (aarch64_fnms<mode>4): Likewise. (aarch64_fnmadd<mode>4): Likewise. From-SVN: r260292
Wilco Dijkstra committed -
From-SVN: r260290
Jason Merrill committed -
2018-05-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (struct stmt_info_for_cost): Add where member. (dump_stmt_cost): Declare. (add_stmt_cost): Dump cost we add. (add_stmt_costs): New function. (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost): No longer exported. (vect_analyze_stmt): Adjust prototype. (vectorizable_condition): Likewise. (vectorizable_live_operation): Likewise. (vectorizable_reduction): Likewise. (vectorizable_induction): Likewise. * tree-vect-loop.c (vect_analyze_loop_operations): Create local cost vector to pass to vectorizable_ and record afterwards. (vect_model_reduction_cost): Take cost vector argument and adjust. (vect_model_induction_cost): Likewise. (vectorizable_reduction): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-slp.c (vect_create_new_slp_node): Initialize SLP_TREE_NUMBER_OF_VEC_STMTS. (vect_analyze_slp_cost_1): Remove. (vect_analyze_slp_cost): Likewise. (vect_slp_analyze_node_operations): Take visited args and a target cost vector. Avoid processing already visited stmt sets. (vect_slp_analyze_operations): Use a local cost vector to gather costs and register those of non-discarded instances. (vect_bb_vectorization_profitable_p): Use add_stmt_costs. (vect_schedule_slp_instance): Remove copying of SLP_TREE_NUMBER_OF_VEC_STMTS. Instead assert that it is not zero. * tree-vect-stmts.c (record_stmt_cost): Remove path directly adding cost. Record cost entry location. (vect_prologue_cost_for_slp_op): Function to compute cost of a constant or invariant generated for SLP vect in the prologue, split out from vect_analyze_slp_cost_1. (vect_model_simple_cost): Make static. Adjust for SLP costing. (vect_model_promotion_demotion_cost): Likewise. (vect_model_store_cost): Likewise, make static. (vect_model_load_cost): Likewise. (vectorizable_bswap): Add cost vector arg and adjust. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (can_vectorize_live_stmts): Likewise. (vect_analyze_stmt): Likewise. (vect_transform_stmt): Adjust calls to vectorizable_*. * tree-vectorizer.c: Include gimple-pretty-print.h. (dump_stmt_cost): New function. From-SVN: r260289
Richard Biener committed -
2018-05-16 Richard Biener <rguenther@suse.de> * params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param. * doc/invoke.texi (dse-max-alias-queries-per-store): Document. * tree-ssa-dse.c: Include tree-ssa-loop.h. (check_name): New callback. (dse_classify_store): Track cycles via a visited bitmap of PHI defs and simplify handling of in-loop and across loop dead stores and properly fail for loop-variant refs. Handle byte-tracking with multiple defs. Use PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE for limiting the walk. * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase. * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise. * gcc.dg/uninit-pr81897-2.c: Use -fno-tree-dse. From-SVN: r260288
Richard Biener committed -
The SLP unrolling factor is calculated by finding the smallest scalar type for each SLP statement and taking the number of required lanes from the vector versions of those scalar types. E.g. for an int32->int64 conversion, it's the vector of int32s rather than the vector of int64s that determines the unroll factor. We rely on tree-vect-patterns.c to replace boolean operations like: bool a, b, c; a = b & c; with integer operations of whatever the best size is in context. E.g. if b and c are fed by comparisons of ints, a, b and c will become the appropriate size for an int comparison. For most targets this means that a, b and c will end up as int-sized themselves, but on targets like SVE and AVX512 with packed vector booleans, they'll instead become a small bitfield like :1, padded to a byte for memory purposes. The SLP code would then take these scalar types and try to calculate the vector type for them, causing the unroll factor to be much higher than necessary. This patch tries to make the SLP code use the same approach as the loop vectorizer, by splitting out the code that calculates the statement vector type and the vector type that should be used for the number of units. 2018-05-16 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare. (vect_get_mask_type_for_stmt): Likewise. * tree-vect-slp.c (vect_two_operations_perm_ok_p): New function, split out from... (vect_build_slp_tree_1): ...here. Use vect_get_vector_types_for_stmt to determine the statement's vector type and the vector type that should be used for calculating nunits. Deal with cases in which the type has to be deferred. (vect_slp_analyze_node_operations): Use vect_get_vector_types_for_stmt and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE. * tree-vect-loop.c (vect_determine_vf_for_stmt_1) (vect_determine_vf_for_stmt): New functions, split out from... (vect_determine_vectorization_factor): ...here. * tree-vect-stmts.c (vect_get_vector_types_for_stmt) (vect_get_mask_type_for_stmt): New functions, split out from vect_determine_vectorization_factor. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_10.c: New test. * gcc.target/aarch64/sve/vcond_10_run.c: Likewise. * gcc.target/aarch64/sve/vcond_11.c: Likewise. * gcc.target/aarch64/sve/vcond_11_run.c: Likewise. From-SVN: r260287
Richard Sandiford committed -
2018-05-16 Richard Biener <rguenther@suse.de> * tree-cfg.c (verify_gimple_assign_ternary): Properly verify the [VEC_]COND_EXPR embedded comparison. From-SVN: r260283
Richard Biener committed -
gcc/ChangeLog: PR tree-optimization/85753 * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Handle RECORD_TYPE in addition to ARRAY_TYPE. gcc/testsuite/ChangeLog: PR tree-optimization/85753 * gcc.dg/Wrestrict-10.c: Adjust. * gcc.dg/Wrestrict-16.c: New test. From-SVN: r260280
Martin Sebor committed -
* cp-tree.h (cp_expr): Remove copy constructor. * mangle.c (struct releasing_vec): Declare copy constructor. From-SVN: r260279
Jason Merrill committed -
From-SVN: r260277
GCC Administrator committed
-