- 12 Jan, 2018 16 commits
-
-
re PR target/81819 ([RX] internal compiler error: in rx_is_restricted_memory_address, at config/rx/rx.c:311) gcc/ PR target/81819 * config/rx/rx.c (rx_is_restricted_memory_address): Handle SUBREG case. From-SVN: r256578
Oleg Endo committed -
It has some problems running on some 64-bit configuratiions, and the bug it is testing for is only on 32-bit; so let's not run it elsewhere. gcc/testsuite/ PR target/83629 * gcc.target/powerpc/pr83629.c: Require ilp32. From-SVN: r256577
Segher Boessenkool committed -
re PR target/80846 (auto-vectorized AVX2 horizontal sum should narrow to 128b right away, to be more efficient for Ryzen and Intel) 2018-01-12 Richard Biener <rguenther@suse.de> PR tree-optimization/80846 * target.def (split_reduction): New target hook. * targhooks.c (default_split_reduction): New function. * targhooks.h (default_split_reduction): Declare. * tree-vect-loop.c (vect_create_epilog_for_reduction): If the target requests first reduce vectors by combining low and high parts. * tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust. (get_vectype_for_scalar_type_and_size): Export. * tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare. * doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document. * doc/tm.texi: Regenerate. i386/ * config/i386/i386.c (ix86_split_reduction): Implement TARGET_VECTORIZE_SPLIT_REDUCTION. * gcc.target/i386/pr80846-1.c: New testcase. * gcc.target/i386/pr80846-2.c: Likewise. From-SVN: r256576
Richard Biener committed -
PR target/83368 * config/sparc/sparc.h (PIC_OFFSET_TABLE_REGNUM): Set to INVALID_REGNUM in PIC mode except for TARGET_VXWORKS_RTP. * config/sparc/sparc.c: Include cfgrtl.h. (TARGET_INIT_PIC_REG): Define. (TARGET_USE_PSEUDO_PIC_REG): Likewise. (sparc_pic_register_p): New predicate. (sparc_legitimate_address_p): Use it. (sparc_legitimize_pic_address): Likewise. (sparc_delegitimize_address): Likewise. (sparc_mode_dependent_address_p): Likewise. (gen_load_pcrel_sym): Remove 4th parameter. (load_got_register): Adjust call to above. Remove obsolete stuff. (sparc_expand_prologue): Do not call load_got_register here. (sparc_flat_expand_prologue): Likewise. (sparc_output_mi_thunk): Set the pic_offset_table_rtx object. (sparc_use_pseudo_pic_reg): New function. (sparc_init_pic_reg): Likewise. * config/sparc/sparc.md (vxworks_load_got): Set the GOT register. (builtin_setjmp_receiver): Enable only for TARGET_VXWORKS_RTP. From-SVN: r256575
Eric Botcazou committed -
2018-01-12 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * doc/sourcebuild.texi (Effective-Target Keywords, Other attributes): Add item for branch_cost. From-SVN: r256574
Christophe Lyon committed -
PR rtl-optimization/83565 * rtlanal.c (nonzero_bits1): On WORD_REGISTER_OPERATIONS machines, do not extend the result to a larger mode for rotate operations. (num_sign_bit_copies1): Likewise. From-SVN: r256572
Eric Botcazou committed -
2018-01-12 Tom de Vries <tom@codesourcery.com> * g++.dg/ext/label13.C: Add dg-require-effective-target indirect_jumps. * g++.dg/ext/label13a.C: Same. * g++.dg/ext/label14.C: Same. * g++.dg/ext/label2.C: Same. * g++.dg/ext/label3.C: Same. * g++.dg/torture/pr42462.C: Same. * g++.dg/torture/pr42739.C: Same. * g++.dg/warn/Wunused-label-3.C: Same. From-SVN: r256571
Tom de Vries committed -
2018-01-12 Tom de Vries <tom@codesourcery.com> * c-c++-common/dwarf2/vla1.c: Add dg-require-effective-target alloca. * g++.dg/Walloca1.C: Same. * g++.dg/cpp0x/pr70338.C: Same. * g++.dg/cpp1y/lambda-generic-vla1.C: Same. * g++.dg/cpp1y/vla10.C: Same. * g++.dg/cpp1y/vla2.C: Same. * g++.dg/cpp1y/vla6.C: Same. * g++.dg/cpp1y/vla8.C: Same. * g++.dg/debug/debug5.C: Same. * g++.dg/debug/debug6.C: Same. * g++.dg/debug/pr54828.C: Same. * g++.dg/diagnostic/pr70105.C: Same. * g++.dg/eh/cleanup5.C: Same. * g++.dg/eh/spbp.C: Same. * g++.dg/ext/tmplattr9.C: Same. * g++.dg/ext/vla10.C: Same. * g++.dg/ext/vla11.C: Same. * g++.dg/ext/vla12.C: Same. * g++.dg/ext/vla15.C: Same. * g++.dg/ext/vla16.C: Same. * g++.dg/ext/vla17.C: Same. * g++.dg/ext/vla3.C: Same. * g++.dg/ext/vla6.C: Same. * g++.dg/ext/vla7.C: Same. * g++.dg/init/array24.C: Same. * g++.dg/init/new47.C: Same. * g++.dg/init/pr55497.C: Same. * g++.dg/opt/pr78201.C: Same. * g++.dg/template/vla2.C: Same. * g++.dg/torture/Wsizeof-pointer-memaccess1.C: Same. * g++.dg/torture/Wsizeof-pointer-memaccess2.C: Same. * g++.dg/torture/pr62127.C: Same. * g++.dg/torture/pr67055.C: Same. * g++.dg/torture/stackalign/eh-alloca-1.C: Same. * g++.dg/torture/stackalign/eh-inline-2.C: Same. * g++.dg/torture/stackalign/eh-vararg-1.C: Same. * g++.dg/torture/stackalign/eh-vararg-2.C: Same. * g++.dg/warn/Wplacement-new-size-5.C: Same. * g++.dg/warn/Wsizeof-pointer-memaccess-1.C: Same. * g++.dg/warn/Wvla-1.C: Same. * g++.dg/warn/Wvla-3.C: Same. * g++.old-deja/g++.ext/array2.C: Same. * g++.old-deja/g++.ext/constructor.C: Same. * g++.old-deja/g++.law/builtin1.C: Same. * g++.old-deja/g++.other/crash12.C: Same. * g++.old-deja/g++.other/eh3.C: Same. * g++.old-deja/g++.pt/array6.C: Same. * g++.old-deja/g++.pt/dynarray.C: Same. From-SVN: r256570
Tom de Vries committed -
* g++.dg/cpp0x/inh-ctor30.C: Allow for alternate mangled form. From-SVN: r256569
Rainer Orth committed -
gcc/testsuite: PR libfortran/67412 * gfortran.dg/execute_command_line_2.f90: Remove dg-xfail-run-if on *-*-solaris2.10. libstdc++-v3: PR libstdc++/64054 * testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc: Remove dg-xfail-run-if. gcc: PR target/40411 * config/sol2.h (STARTFILE_ARCH_SPEC): Don't use with -shared or -symbolic. Use values-Xc.o for -pedantic. Link with values-xpg4.o for C90, values-xpg6.o otherwise. From-SVN: r256568
Rainer Orth committed -
* lib/target-supports.exp (check_effective_target_branch_cost): Accept all x86 targets. From-SVN: r256567
Rainer Orth committed -
2018-01-12 Martin Liska <mliska@suse.cz> PR ipa/83054 * ipa-devirt.c (final_warning_record::grow_type_warnings): New function. (possible_polymorphic_call_targets): Use it. (ipa_devirt): Likewise. 2018-01-12 Martin Liska <mliska@suse.cz> PR ipa/83054 * g++.dg/warn/pr83054.C: New test. From-SVN: r256566
Martin Liska committed -
2018-01-12 Martin Liska <mliska@suse.cz> * profile-count.h (enum profile_quality): Use 0 as invalid enum value of profile_quality. From-SVN: r256565
Martin Liska committed -
gcc/ * doc/invoke.texi (NDS32 Options): Add -mext-perf, -mext-perf2 and -mext-string options. From-SVN: r256564
Chung-Ju Wu committed -
lto-streamer-out.c (DFS::DFS_write_tree_body): Process DECL_DEBUG_EXPR conditional on DECL_HAS_DEBUG_EXPR_P. 2018-01-12 Richard Biener <rguenther@suse.de> * lto-streamer-out.c (DFS::DFS_write_tree_body): Process DECL_DEBUG_EXPR conditional on DECL_HAS_DEBUG_EXPR_P. * tree-streamer-in.c (lto_input_ts_decl_common_tree_pointers): Likewise. * tree-streamer-out.c (write_ts_decl_common_tree_pointers): Likewise. From-SVN: r256563
Richard Biener committed -
From-SVN: r256561
GCC Administrator committed
-
- 11 Jan, 2018 24 commits
-
-
configure.ac (--with-long-double-format): Add support for the configuration option to change the default long double... 2018-01-11 Michael Meissner <meissner@linux.vnet.ibm.com> * configure.ac (--with-long-double-format): Add support for the configuration option to change the default long double format on PowerPC systems. * config.gcc (powerpc*-linux*-*): Likewise. * configure: Regenerate. * config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): If long double is IEEE, define __KC__ and __KF__ to allow floatn.h to be used without modification. From-SVN: r256558
Michael Meissner committed -
[gcc] 2018-01-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * config/rs6000/rs6000-builtin.def (BU_P7_MISC_X): New #define. (SPEC_BARRIER): New instantiation of BU_P7_MISC_X. * config/rs6000/rs6000.c (rs6000_expand_builtin): Handle MISC_BUILTIN_SPEC_BARRIER. (rs6000_init_builtins): Likewise. * config/rs6000/rs6000.md (UNSPECV_SPEC_BARRIER): New UNSPECV enum value. (speculation_barrier): New define_insn. * doc/extend.texi: Document __builtin_speculation_barrier. [gcc/testsuite] 2018-01-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * gcc.target/powerpc/spec-barr-1.c: New file. From-SVN: r256557
Bill Schmidt committed -
PR target/83203 * config/i386/i386.c (ix86_expand_vector_init_one_nonzero): If one_var is 0, for V{8,16}S[IF] and V[48]D[IF]mode use gen_vec_set<mode>_0. * config/i386/sse.md (VI8_AVX_AVX512F, VI4F_256_512): New mode iterators. (ssescalarmodesuffix): Add 512-bit vectors. Use "d" or "q" for integral modes instead of "ss" and "sd". (vec_set<mode>_0): New define_insns for 256-bit and 512-bit vectors with 32-bit and 64-bit elements. (vecdupssescalarmodesuffix): New mode attribute. (vec_dup<mode>): Use it. From-SVN: r256556
Jakub Jelinek committed -
When a function call is removed, it may become a leaf function. But if argument may be passed on stack, we need to align the stack frame when there is no tail call. Tested on Linux/i686 and Linux/x86-64. gcc/ PR target/83330 * config/i386/i386.c (ix86_compute_frame_layout): Align stack frame if argument is passed on stack. gcc/testsuite/ PR target/83330 * gcc.target/i386/pr83330.c: New test. From-SVN: r256555
H.J. Lu committed -
2018-01-11 Steven G. Kargl <kargl@gcc.gnu.org> PR fortran/79383 * gfortran.dg/dtio_31.f03: New test. * gfortran.dg/dtio_32.f03: New test. From-SVN: r256554
Steven G. Kargl committed -
PR go/83794 misc/cgo/test: avoid endless loop when we can't parse notes Reviewed-on: https://go-review.googlesource.com/87416 From-SVN: r256553
Ian Lance Taylor committed -
gcc/testsuite/ChangeLog: PR c++/43486 * g++.dg/wrappers: New subdirectory. * g++.dg/wrappers/README: New file. * g++.dg/wrappers/alloc.C: New test case. * g++.dg/wrappers/cow-istream-string.C: New test case. * g++.dg/wrappers/cp-stdlib.C: New test case. * g++.dg/wrappers/sanitizer_coverage_libcdep_new.C: New test case. * g++.dg/wrappers/wrapper-around-type-pack-expansion.C: New test case. From-SVN: r256552
David Malcolm committed -
re PR target/82682 (FAIL: gcc.target/i386/pr50038.c scan-assembler-times movzbl 2 (found 3 times) since r253958) PR target/82682 * ree.c (combine_reaching_defs): Optimize also reg2=exp; reg1=reg2; reg2=any_extend(reg1); into reg2=any_extend(exp); reg1=reg2;, formatting fix. From-SVN: r256551
Jakub Jelinek committed -
PR c++/82799 PR c++/83690 * call.c (perform_implicit_conversion_flags): Call mark_rvalue_use. * decl.c (case_conversion): Likewise. * semantics.c (finish_static_assert): Call perform_implicit_conversion_flags. From-SVN: r256550
Jason Merrill committed -
PR middle-end/83189 * gimple-ssa-isolate-paths.c (isolate_path): Fix profile update. From-SVN: r256545
Jan Hubicka committed -
PR middle-end/83718 * tree-inline.c (copy_cfg_body): Adjust num&den for scaling after they are computed. * g++.dg/torture/pr83718.C: New testcase. From-SVN: r256544
Jan Hubicka committed -
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00923.html * method.c (enum mangling_flags): Delete long-dead enum. From-SVN: r256543
Nathan Sidwell committed -
PR ipa/83178 * g++.dg/ipa/devirt-22.C: Adjust scan-dump-times count. From-SVN: r256542
Martin Jambor committed -
PR tree-optimization/83695 * gimple-loop-linterchange.cc (tree_loop_interchange::interchange_loops): Call scev_reset_htab to reset cached scev information after interchange. (pass_linterchange::execute): Remove call to scev_reset_htab. gcc/testsuite PR tree-optimization/83695 * gcc.dg/tree-ssa/pr83695.c: New test. From-SVN: r256541
Bin Cheng committed -
This patch implements the lane-wise fp16fml intrinsics. There's quite a few of them so I've split them up from the other simpler fp16fml intrinsics. These ones expose instructions such as vfmal.f16 Dd, Sn, Sm[<index>] 0 <= index <= 1 vfmal.f16 Qd, Dn, Dm[<index>] 0 <= index <= 3 vfmsl.f16 Dd, Sn, Sm[<index>] 0 <= index <= 1 vfmsl.f16 Qd, Dn, Dm[<index>] 0 <= index <= 3 These instructions extract a single half-precision floating-point value from one of the source regs and perform a vfmal/vfmsl operation as per the normal variant with that value. The nuance here is that some of the intrinsics want to do things like: float32x2_t vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b, const int __index) where the float16x8_t value of '__b' is held in a Q register, so we need to be a bit smart about finding the right D or S sub-register and translating the lane number to a lane in that sub-register, instead of just passing the language-level const-int down to the assembly instruction. That's where most of the complexity of this patch comes from but hopefully it's orthogonal enough to make sense. Bootstrapped and tested on arm-none-linux-gnueabihf as well as armeb-none-eabi. * config/arm/arm_neon.h (vfmlal_lane_low_u32, vfmlal_lane_high_u32, vfmlalq_laneq_low_u32, vfmlalq_lane_low_u32, vfmlal_laneq_low_u32, vfmlalq_laneq_high_u32, vfmlalq_lane_high_u32, vfmlal_laneq_high_u32, vfmlsl_lane_low_u32, vfmlsl_lane_high_u32, vfmlslq_laneq_low_u32, vfmlslq_lane_low_u32, vfmlsl_laneq_low_u32, vfmlslq_laneq_high_u32, vfmlslq_lane_high_u32, vfmlsl_laneq_high_u32): Define. * config/arm/arm_neon_builtins.def (vfmal_lane_low, vfmal_lane_lowv4hf, vfmal_lane_lowv8hf, vfmal_lane_high, vfmal_lane_highv4hf, vfmal_lane_highv8hf, vfmsl_lane_low, vfmsl_lane_lowv4hf, vfmsl_lane_lowv8hf, vfmsl_lane_high, vfmsl_lane_highv4hf, vfmsl_lane_highv8hf): New sets of builtins. * config/arm/iterators.md (VFMLSEL2, vfmlsel2): New mode attributes. (V_lane_reg): Likewise. * config/arm/neon.md (neon_vfm<vfml_op>l_lane_<vfml_half><VCVTF:mode>): New define_expand. (neon_vfm<vfml_op>l_lane_<vfml_half><vfmlsel2><mode>): Likewise. (vfmal_lane_low<mode>_intrinsic, vfmal_lane_low<vfmlsel2><mode>_intrinsic, vfmal_lane_high<vfmlsel2><mode>_intrinsic, vfmal_lane_high<mode>_intrinsic, vfmsl_lane_low<mode>_intrinsic, vfmsl_lane_low<vfmlsel2><mode>_intrinsic, vfmsl_lane_high<vfmlsel2><mode>_intrinsic, vfmsl_lane_high<mode>_intrinsic): New define_insns. * gcc.target/arm/simd/fp16fml_lane_high.c: New test. * gcc.target/arm/simd/fp16fml_lane_low.c: New test. From-SVN: r256540
Kyrylo Tkachov committed -
This patch adds the +fp16fml extension that enables some half-precision floating-point Advanced SIMD instructions, available through arm_neon.h intrinsics. This extension is on by default for armv8.4-a if fp16 is available, so it can be enabled by -march=armv8.4-a+fp16. fp16fml is also available for armv8.2-a and armv8.3-a through the +fp16fml option that is added for these architectures. The new instructions that this patch adds support for are: vfmal.f16 Dr, Sm, Sn vfmal.f16 Qr, Dm, Dn vfmsl.f16 Dr, Sm, Sn vfmsl.f16 Qr, Dm, Dn They interpret their input registers as a vector of half-precision floating-point values, extend them to single-precision vectors and perform a fused multiply-add or subtract of them with the destination vector. This patch exposes these instructions through arm_neon.h intrinsics. The set of intrinsics allows us to do stuff such as perform the multiply-add/subtract operation on the low or top half of float16x4_t and float16x8_t values. This maps naturally in aarch64 to the FMLAL and FMLAL2 instructions but on arm we have to use the fact that consecutive NEON registers overlap the wider register (i.e. d0 is s0 plus s1, q0 is d0 plus d1 etc). This just means we have to be careful to use the right subreg operand print code. New arm-specific builtins are defined to expand to the new patterns. I've managed to compress the define_expands using code, mode and int iterators but the define_insns don't compress very well without two-tiered iterators (iterator attributes expanding to iterators) which we don't support. Bootstrapped and tested on arm-none-linux-gnueabihf and also on armeb-none-eabi. * config/arm/arm-cpus.in (fp16fml): New feature. (ALL_SIMD): Add fp16fml. (armv8.2-a): Add fp16fml as an option. (armv8.3-a): Likewise. (armv8.4-a): Add fp16fml as part of fp16. * config/arm/arm.h (TARGET_FP16FML): Define. * config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_FP16_FML when appropriate. * config/arm/arm-modes.def (V2HF): Define. * config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32, vfmlal_high_u32, vfmlsl_high_u32, vfmlalq_low_u32, vfmlslq_low_u32, vfmlalq_high_u32, vfmlslq_high_u32): Define. * config/arm/arm_neon_builtins.def (vfmal_low, vfmal_high, vfmsl_low, vfmsl_high): New set of builtins. * config/arm/iterators.md (PLUSMINUS): New code iterator. (vfml_op): New code attribute. (VFMLHALVES): New int iterator. (VFML, VFMLSEL): New mode attributes. (V_reg): Define mapping for V2HF. (V_hi, V_lo): New mode attributes. (VF_constraint): Likewise. (vfml_half, vfml_half_selector): New int attributes. * config/arm/neon.md (neon_vfm<vfml_op>l_<vfml_half><mode>): New define_expand. (vfmal_low<mode>_intrinsic, vfmsl_high<mode>_intrinsic, vfmal_high<mode>_intrinsic, vfmsl_low<mode>_intrinsic): New define_insn. * config/arm/t-arm-elf (v8_fps): Add fp16fml. * config/arm/t-multilib (v8_2_a_simd_variants): Add fp16fml. * config/arm/unspecs.md (UNSPEC_VFML_LO, UNSPEC_VFML_HI): New unspecs. * doc/invoke.texi (ARM Options): Document fp16fml. Update armv8.4-a documentation. * doc/sourcebuild.texi (arm_fp16fml_neon_ok, arm_fp16fml_neon): Document new effective target and option set. * gcc.target/arm/multilib.exp: Add combination tests for fp16fml. * gcc.target/arm/simd/fp16fml_high.c: New test. * gcc.target/arm/simd/fp16fml_low.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_fp16fml_neon_ok_nocache, check_effective_target_arm_fp16fml_neon_ok, add_options_for_arm_fp16fml_neon): New procedures. From-SVN: r256539
Kyrylo Tkachov committed -
This patch adds support for the Armv8.4-A architecture [1] in the arm backend. This is done through the new -march=armv8.4-a option. With this patch armv8.4-a is recognised as an argument and supports the extensions: simd, fp16, crypto, nocrypto, nofp with the familiar meaning of these options. Worth noting that there is no dotprod option like in armv8.2-a and armv8.3-a because Dot Product support is mandatory in Armv8.4-A when simd is available, so when using +simd (of fp16 which enables +simd), the +dotprod is implied. The various multilib selection makefile fragments are updated too and the mutlilib.exp test gets a few armv8.4-a combination tests. Bootstrapped and tested on arm-none-linux-gnueabihf. From-SVN: r256537
Kyrylo Tkachov committed -
gcc/ PR target/81821 * config/rx/rx.md (BW): New mode attribute. (sync_lock_test_and_setsi): Add mode suffix to insn output. From-SVN: r256536
Oleg Endo committed -
2018-01-11 Richard Biener <rguenther@suse.de> PR tree-optimization/83435 * graphite.c (canonicalize_loop_form): Ignore fake loop exit edges. * graphite-scop-detection.c (scop_detection::get_sese): Likewise. * tree-vrp.c (add_assert_info): Drop TREE_OVERFLOW if they appear. * gcc.dg/graphite/pr83435.c: New testcase. From-SVN: r256535
Richard Biener committed -
This patch records the integer value of the address offset in aarch64_address_info, so that it doesn't need to be re-extracted from the rtx. The SVE port will make more use of this. The patch also uses poly_int64 routines to manipulate the offset, rather than just handling CONST_INTs. 2018-01-11 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_address_info): Add a const_offset field. (aarch64_classify_address): Initialize it. Track polynomial offsets. (aarch64_print_address_internal): Use it to check for a zero offset. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256534
Richard Sandiford committed -
This patch switches the AArch64 port to use 2 poly_int coefficients and updates code as necessary to keep it compiling. One potentially-significant change is to aarch64_hard_regno_caller_save_mode. The old implementation was written in a pretty conservative way: it changed the default behaviour for single-register values, but used the default handling for multi-register values. I don't think that's necessary, since the interesting cases for this macro are usually the single-register ones. Multi-register modes take up the whole of the constituent registers and the move patterns for all multi-register modes should be equally good. Using the original mode for multi-register cases stops us from using SVE modes to spill multi-register NEON values. This was caught by gcc.c-torture/execute/pr47538.c. Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1. GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles (which are all scalars), and I think it's more obvious, since if we ever do use this for elementwise shifts of vector modes, the mask will depend on the number of bits in each element rather than the number of bits in the whole vector. 2018-01-11 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2. * config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset): Return a poly_int64 rather than a HOST_WIDE_INT. (aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64 rather than a HOST_WIDE_INT. * config/aarch64/aarch64.h (aarch64_frame): Protect with HAVE_POLY_INT_H rather than HOST_WIDE_INT. Change locals_offset, hard_fp_offset, frame_size, initial_adjust, callee_offset and final_offset from HOST_WIDE_INT to poly_int64. * config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use to_constant when getting the number of units in an Advanced SIMD mode. (aarch64_builtin_vectorized_function): Check for a constant number of units. * config/aarch64/aarch64-simd.md (mov<mode>): Handle polynomial GET_MODE_SIZE. (aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use the nunits attribute instead of GET_MODE_NUNITS. * config/aarch64/aarch64.c (aarch64_hard_regno_nregs) (aarch64_class_max_nregs): Use the constant_lowest_bound of the GET_MODE_SIZE for fixed-size registers. (aarch64_const_vec_all_same_in_range_p): Use const_vec_duplicate_p. (aarch64_hard_regno_call_part_clobbered, aarch64_classify_index) (aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address) (aarch64_legitimize_address_displacement, aarch64_secondary_reload) (aarch64_print_operand, aarch64_print_address_internal) (aarch64_address_cost, aarch64_rtx_costs, aarch64_register_move_cost) (aarch64_short_vector_p, aapcs_vfp_sub_candidate) (aarch64_simd_attr_length_rglist, aarch64_operands_ok_for_ldpstp): Handle polynomial GET_MODE_SIZE. (aarch64_hard_regno_caller_save_mode): Likewise. Return modes wider than SImode without modification. (tls_symbolic_operand_type): Use strip_offset instead of split_const. (aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward) (aarch64_gimplify_va_arg_expr): Assert that we don't yet handle passing and returning SVE modes. (aarch64_function_value, aarch64_layout_arg): Use gen_int_mode rather than GEN_INT. (aarch64_emit_probe_stack_range): Take the size as a poly_int64 rather than a HOST_WIDE_INT, but call sorry if it isn't constant. (aarch64_allocate_and_probe_stack_space): Likewise. (aarch64_layout_frame): Cope with polynomial offsets. (aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the start_offset as a poly_int64 rather than a HOST_WIDE_INT. Track polynomial offsets. (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p) (aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64 rather than a HOST_WIDE_INT. (aarch64_get_separate_components, aarch64_process_components) (aarch64_expand_prologue, aarch64_expand_epilogue) (aarch64_use_return_insn_p): Handle polynomial frame offsets. (aarch64_anchor_offset): New function, split out from... (aarch64_legitimize_address): ...here. (aarch64_builtin_vectorization_cost): Handle polynomial TYPE_VECTOR_SUBPARTS. (aarch64_simd_check_vect_par_cnst_half): Handle polynomial GET_MODE_NUNITS. (aarch64_simd_make_constant, aarch64_expand_vector_init): Get the number of elements from the PARALLEL rather than the mode. (aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE rather than GET_MODE_BITSIZE. (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_ext) (aarch64_evpc_rev, aarch64_evpc_dup, aarch64_evpc_zip) (aarch64_expand_vec_perm_const_1): Handle polynomial d->perm.length () and d->perm elements. (aarch64_evpc_tbl): Likewise. Use nelt rather than GET_MODE_NUNITS. Apply to_constant to d->perm elements. (aarch64_simd_valid_immediate, aarch64_vec_fpconst_pow_of_2): Handle polynomial CONST_VECTOR_NUNITS. (aarch64_move_pointer): Take amount as a poly_int64 rather than an int. (aarch64_progress_pointer): Avoid temporary variable. * config/aarch64/aarch64.md (aarch64_<crc_variant>): Use the mode attribute instead of GET_MODE. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256533
Richard Sandiford committed -
The port had aarch64_add_offset and aarch64_add_constant routines that did similar things. This patch replaces them with an expanded version of aarch64_add_offset that takes separate source and destination registers. The new routine also takes a poly_int64 offset instead of a HOST_WIDE_INT offset, but it leaves the HOST_WIDE_INT case to aarch64_add_offset_1, which is basically a repurposed aarch64_add_constant_internal. The SVE patch will put the handling of VL-based constants in aarch64_add_offset, while still using aarch64_add_offset_1 for the constant part. The vcall_offset == 0 path in aarch64_output_mi_thunk will use temp0 as well as temp1 once SVE is added. A side-effect of the patch is that we now generate: mov x29, sp instead of: add x29, sp, 0 in the pr70044.c test. 2018-01-11 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_force_temporary): Assert that x exists before using it. (aarch64_add_constant_internal): Rename to... (aarch64_add_offset_1): ...this. Replace regnum with separate src and dest rtxes. Handle the case in which they're different, including when the offset is zero. Replace scratchreg with an rtx. Use 2 additions if there is no spare register into which we can move a 16-bit constant. (aarch64_add_constant): Delete. (aarch64_add_offset): Replace reg with separate src and dest rtxes. Take a poly_int64 offset instead of a HOST_WIDE_INT. Use aarch64_add_offset_1. (aarch64_add_sp, aarch64_sub_sp): Take the scratch register as an rtx rather than an int. Take the delta as a poly_int64 rather than a HOST_WIDE_INT. Use aarch64_add_offset. (aarch64_expand_mov_immediate): Update uses of aarch64_add_offset. (aarch64_expand_prologue): Update calls to aarch64_sub_sp, aarch64_allocate_and_probe_stack_space and aarch64_add_offset. (aarch64_expand_epilogue): Update calls to aarch64_add_offset and aarch64_add_sp. (aarch64_output_mi_thunk): Use aarch64_add_offset rather than aarch64_add_constant. gcc/testsuite/ * gcc.target/aarch64/pr70044.c: Allow "mov x29, sp" too. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256532
Richard Sandiford committed -
In preparation for the switch to NUM_POLY_INT_COEFFS==2. 2018-01-11 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * config/aarch64/aarch64.c (aarch64_reinterpret_float_as_int): Use scalar_float_mode. From-SVN: r256531
Richard Sandiford committed -
This patch replaces GET_MODE_NUNITS in some of the v8.4 support with equivalent values, in preparation for the switch to NUM_POLY_INT_COEFFS==2. 2018-01-11 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * config/aarch64/aarch64-simd.md (aarch64_fml<f16mac1>l<f16quad>_low<mode>): Avoid GET_MODE_NUNITS. (aarch64_fml<f16mac1>l<f16quad>_high<mode>): Likewise. (aarch64_fml<f16mac1>l_lane_lowv2sf): Likewise. (aarch64_fml<f16mac1>l_lane_highv2sf): Likewise. (aarch64_fml<f16mac1>lq_laneq_lowv4sf): Likewise. (aarch64_fml<f16mac1>lq_laneq_highv4sf): Likewise. (aarch64_fml<f16mac1>l_laneq_lowv2sf): Likewise. (aarch64_fml<f16mac1>l_laneq_highv2sf): Likewise. (aarch64_fml<f16mac1>lq_lane_lowv4sf): Likewise. (aarch64_fml<f16mac1>lq_lane_highv4sf): Likewise. From-SVN: r256530
Richard Sandiford committed
-