- 31 Jan, 2020 18 commits
-
-
This mostly follows existing practice. Perhaps the only noteworthy thing is that svmmla is split across three extensions (i8mm, f32mm and f64mm), any of which can be enabled independently. The easiest way of coping with this seemed to be to add a fourth svmmla entry for base SVE, but with no type suffixes. This means that the overloaded function is always available for C, but never successfully resolves without the appropriate target feature. 2020-01-31 Dennis Zhang <dennis.zhang@arm.com> Matthew Malcomson <matthew.malcomson@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ * doc/invoke.texi (f32mm): Document new AArch64 -march= extension. * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define __ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and __ARM_FEATURE_SVE_MATMUL_FP64 as appropriate. Don't define __ARM_FEATURE_MATMUL_FP64. * config/aarch64/aarch64-option-extensions.def (fp, simd, fp16) (sve): Add AARCH64_FL_F32MM to the list of extensions that should be disabled at the same time. (f32mm): New extension. * config/aarch64/aarch64.h (AARCH64_FL_F32MM): New macro. (AARCH64_FL_F64MM): Bump to the next bit up. (AARCH64_ISA_F32MM, TARGET_SVE_I8MM, TARGET_F32MM, TARGET_SVE_F32MM) (TARGET_SVE_F64MM): New macros. * config/aarch64/iterators.md (SVE_MATMULF): New mode iterator. (UNSPEC_FMMLA, UNSPEC_SMATMUL, UNSPEC_UMATMUL, UNSPEC_USMATMUL) (UNSPEC_TRN1Q, UNSPEC_TRN2Q, UNSPEC_UZP1Q, UNSPEC_UZP2Q, UNSPEC_ZIP1Q) (UNSPEC_ZIP2Q): New unspeccs. (DOTPROD_US_ONLY, PERMUTEQ, MATMUL, FMMLA): New int iterators. (optab, sur, perm_insn): Handle the new unspecs. (sve_fp_op): Handle UNSPEC_FMMLA. Resort. * config/aarch64/aarch64-sve.md (@aarch64_sve_ld1ro<mode>): Use TARGET_SVE_F64MM instead of separate tests. (@aarch64_<DOTPROD_US_ONLY:sur>dot_prod<vsi2qi>): New pattern. (@aarch64_<DOTPROD_US_ONLY:sur>dot_prod_lane<vsi2qi>): Likewise. (@aarch64_sve_add_<MATMUL:optab><vsi2qi>): Likewise. (@aarch64_sve_<FMMLA:sve_fp_op><mode>): Likewise. (@aarch64_sve_<PERMUTEQ:optab><mode>): Likewise. * config/aarch64/aarch64-sve-builtins.cc (TYPES_s_float): New macro. (TYPES_s_float_hsd_integer, TYPES_s_float_sd_integer): Use it. (TYPES_s_signed): New macro. (TYPES_s_integer): Use it. (TYPES_d_float): New macro. (TYPES_d_data): Use it. * config/aarch64/aarch64-sve-builtins-shapes.h (mmla): Declare. (ternary_intq_uintq_lane, ternary_intq_uintq_opt_n, ternary_uintq_intq) (ternary_uintq_intq_lane, ternary_uintq_intq_opt_n): Likewise. * config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): New class. (svmmla): New shape. (ternary_resize2_opt_n_base): Add TYPE_CLASS2 and TYPE_CLASS3 template parameters. (ternary_resize2_lane_base): Likewise. (ternary_resize2_base): New class. (ternary_qq_lane_base): Likewise. (ternary_intq_uintq_lane_def): Likewise. (ternary_intq_uintq_lane): New shape. (ternary_intq_uintq_opt_n_def): New class (ternary_intq_uintq_opt_n): New shape. (ternary_qq_lane_def): Inherit from ternary_qq_lane_base. (ternary_uintq_intq_def): New class. (ternary_uintq_intq): New shape. (ternary_uintq_intq_lane_def): New class. (ternary_uintq_intq_lane): New shape. (ternary_uintq_intq_opt_n_def): New class. (ternary_uintq_intq_opt_n): New shape. * config/aarch64/aarch64-sve-builtins-base.h (svmmla, svsudot) (svsudot_lane, svtrn1q, svtrn2q, svusdot, svusdot_lane, svusmmla) (svuzp1q, svuzp2q, svzip1q, svzip2q): Declare. * config/aarch64/aarch64-sve-builtins-base.cc (svdot_lane_impl): Generalize to... (svdotprod_lane_impl): ...this new class. (svmmla_impl, svusdot_impl): New classes. (svdot_lane): Update to use svdotprod_lane_impl. (svmmla, svsudot, svsudot_lane, svtrn1q, svtrn2q, svusdot) (svusdot_lane, svusmmla, svuzp1q, svuzp2q, svzip1q, svzip2q): New functions. * config/aarch64/aarch64-sve-builtins-base.def (svmmla): New base function, with no types defined. (svmmla, svusmmla, svsudot, svsudot_lane, svusdot, svusdot_lane): New AARCH64_FL_I8MM functions. (svmmla): New AARCH64_FL_F32MM function. (svld1ro): Depend only on AARCH64_FL_F64MM, not on AARCH64_FL_V8_6. (svmmla, svtrn1q, svtrn2q, svuz1q, svuz2q, svzip1q, svzip2q): New AARCH64_FL_F64MM function. (REQUIRED_EXTENSIONS): gcc/testsuite/ * lib/target-supports.exp (check_effective_target_aarch64_asm_i8mm_ok) (check_effective_target_aarch64_asm_f32mm_ok): New target selectors. * gcc.target/aarch64/pragma_cpp_predefs_2.c: Test handling of __ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and __ARM_FEATURE_SVE_MATMUL_FP64. * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_TRIPLE_Z): (TEST_TRIPLE_Z_REV2, TEST_TRIPLE_Z_REV, TEST_TRIPLE_LANE_REG) (TEST_TRIPLE_ZX): New macros. * gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Remove +sve and rely on +f64mm to enable it. * gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_f32.c: New test. * gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise, * gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise, * gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise, * gcc.target/aarch64/sve/acle/asm/sudot_lane_s32.c: Likewise, * gcc.target/aarch64/sve/acle/asm/sudot_s32.c: Likewise, * gcc.target/aarch64/sve/acle/asm/trn1q_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn1q_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/trn2q_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/usdot_lane_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/usdot_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp1q_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/uzp2q_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip1q_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/zip2q_u8.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/mmla_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/mmla_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/mmla_3.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/mmla_4.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/mmla_5.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/mmla_6.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/mmla_7.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_opt_n_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_opt_n_1.c: Likewise.
Richard Sandiford committed -
This patch should (finally!) give clean test results for aarch64-sve-pcs.exp for all {be,le}{lp64,ilp32} combinations. The *_128.c tests require aarch64_little_endian because they test for fixed-length 128-bit code, whereas -msve-vector-bits=128 still generates VLA code for big-endian. Some tests require lp64 because they match (64-bit) pointer loads and stores. Others require it because ilp32 adds extra zero extensions. We still have a non-trivial amount of coverage for -mbig-endian -mabi=ilp32: # of expected passes 663 # of unsupported tests 59 2020-01-31 Richard Sandiford <richard.sandiford@arm.com> gcc/testsuite/ * gcc.target/aarch64/sve/pcs/args_1.c: Require lp64 for check-function-bodies tests. * gcc.target/aarch64/sve/pcs/args_2.c: Likewise. * gcc.target/aarch64/sve/pcs/args_3.c: Likewise. * gcc.target/aarch64/sve/pcs/args_4.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/return_2.c: Likewise. * gcc.target/aarch64/sve/pcs/return_3.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_3.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise. * gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Require lp64. * gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_7.c: Likewise. * gcc.target/aarch64/sve/pcs/args_8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_9.c: Likewise. * gcc.target/aarch64/sve/pcs/return_4_128.c: Require lp64 and aarch64_little_endian for check-function-bodies tests. * gcc.target/aarch64/sve/pcs/return_5_128.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise. * gcc.target/aarch64/sve/pcs/return_1_128.c: Likewise. Remove target selector from dg-compile. * gcc.target/aarch64/sve/pcs/return_6_128.c: Likewise.
Richard Sandiford committed -
It seems that in practice std::sentinel_for<I, I> is always true, and so the test_range container doesn't help us detect bugs in ranges code in which we wrongly assume that a sentinel can be manipulated like an iterator. Make the test_range range more strict by having end() unconditionally return a sentinel<I>, and adjust some tests accordingly. libstdc++-v3/ChangeLog: * testsuite/24_iterators/range_operations/distance.cc: Do not assume test_range::end() returns the same type as test_range::begin(). * testsuite/24_iterators/range_operations/next.cc: Likewise. * testsuite/24_iterators/range_operations/prev.cc: Likewise. * testsuite/util/testsuite_iterators.h (__gnu_test::test_range::end): Always return a sentinel<I>.
Patrick Palka committed -
Fix ICE in testcase gfortran.dg/assumed_rank_bounds_3.f90. 2020-01-31 Andrew Stubbs <ams@codesourcery.com> gcc/ * config/gcn/gcn-valu.md (addv64di3_exec): Allow one '0' in each alternative only.
Andrew Stubbs committed -
The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is only insn size, as advised in e.g. Software Optimization Guide for the AMD Family 15h Processors [1], section 7.1.2, where it is said: --quote-- 7.1.2 Reduce Instruction SizeOptimization Reduce the size of instructions when possible. Rationale Using smaller instruction sizes improves instruction fetch throughput. Specific examples include the following: *In SIMD code, use the single-precision (PS) form of instructions instead of the double-precision (PD) form. For example, for register to register moves, MOVAPS achieves the same result as MOVAPD, but uses one less byte to encode the instruction and has no prefix byte. Other examples in which single-precision forms can be substituted for double-precision forms include MOVUPS, MOVNTPS, XORPS, ORPS, ANDPS, and SHUFPS. ... --/quote-- Please note that this optimization applies only to non-AVX forms, as demonstrated by: 0: 0f 28 c8 movaps %xmm0,%xmm1 3: 66 0f 28 c8 movapd %xmm0,%xmm1 7: c5 f8 28 d1 vmovaps %xmm1,%xmm2 b: c5 f9 28 d1 vmovapd %xmm1,%xmm2 Also note that MOVDQA is missing in the above optimization. It is harmful to substitute MOVDQA with MOVAPS, as it can (and does) introduce +1 cycle forwarding penalty between FLT (FPA/FPM) and INT (VALU) FP clusters. [1] https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf
Uros Bizjak committed -
2020-01-31 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/ * config/gcn/mkoffload.c (process_asm): Add sgpr_count and vgpr_count to definition of hsa_kernel_description. Parse assembly to find SGPR and VGPR count of kernel and store in hsa_kernel_description. libgomp/ * plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count and vgpr_count fields. (struct kernel_info): Add a field for a hsa_kernel_description. (run_kernel): Reduce the number of threads/workers if the requested number would require too many VGPRs. (init_basic_kernel_info): Initialize description field with the hsa_kernel_description entry for the kernel.
Kwok Cheung Yeung committed -
PR fortran/93462 * frontend-passes.c (gfc_code_walker): For EXEC_OACC_ATOMIC, set in_omp_atomic to true prevent front-end optimization. PR fortran/93462 * gfortran.dg/goacc/atomic-1.f90: New.
Tobias Burnus committed -
This fixes a fall-out from a patch I had submitted two years ago which started allowing simplify-rtx to fold logical right shifts by offsets a followed by b into >> (a + b). However this can generate inefficient code when the resulting shift count ends up being the same as the size of the shift mode. This will create some undefined behavior on most platforms. This patch changes to code to truncate to 0 if the shift amount goes out of range. Before my older patch this used to happen in combine when it saw the two shifts. However since we combine them here combine never gets a chance to truncate them. The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal with this shift constant but it's better to do the right thing in simplify-rtx. Note that this doesn't take care of the Arithmetic shift where you could replace the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it. gcc/ChangeLog: PR rtl-optimization/91838 * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case to truncate if allowed or reject combination. gcc/testsuite/ChangeLog: PR rtl-optimization/91838 * g++.dg/pr91838.C: New test.
Tamar Christina committed -
2020-01-31 Andrew Stubbs <ams@codesourcery.com> gcc/ * tree-ssa-loop-ivopts.c (get_iv): Use sizetype for zero-step. (find_inv_vars_cb): Likewise.
Andrew Stubbs committed -
This patch refactors some code in special_function_p that checks for the function being sane to match by name, splitting it out into a new maybe_special_function_p, and using it it two places in the analyzer. gcc/analyzer/ChangeLog: * analyzer.cc (is_named_call_p): Replace tests for fndecl being extern at file scope and having a non-NULL DECL_NAME with a call to maybe_special_function_p. * function-set.cc (function_set::contains_decl_p): Add call to maybe_special_function_p. gcc/ChangeLog: * calls.c (special_function_p): Split out the check for DECL_NAME being non-NULL and fndecl being extern at file scope into a new maybe_special_function_p and call it. Drop check for fndecl being non-NULL that was after a usage of DECL_NAME (fndecl). * tree.h (maybe_special_function_p): New inline function.
David Malcolm committed -
gcc/analyzer/ChangeLog: PR analyzer/93450 * constraint-manager.cc (constraint_manager::get_or_add_equiv_class): Only compare constants if their types are compatible. * region-model.cc (constant_svalue::eval_condition): Replace check for identical types with call to types_compatible_p.
David Malcolm committed -
Fixes an execution failure in testcase gfortran.dg/assumed_rank_1.f90. 2020-01-30 Andrew Stubbs <ams@codesourcery.com> gcc/ * config/gcn/gcn-valu.md (gather<mode>_exec): Move contents ... (mask_gather_load<mode>): ... here, and zero-initialize the destination. (maskload<mode>di): Zero-initialize the destination. * config/gcn/gcn.c:
Andrew Stubbs committed -
gcc/analyzer/ChangeLog: * program-state.cc (extrinsic_state::dump_to_pp): New. (extrinsic_state::dump_to_file): New. (extrinsic_state::dump): New. * program-state.h (extrinsic_state::dump_to_pp): New decl. (extrinsic_state::dump_to_file): New decl. (extrinsic_state::dump): New decl. * sm.cc: Include "pretty-print.h". (state_machine::dump_to_pp): New. * sm.h (state_machine::dump_to_pp): New decl.
David Malcolm committed -
gcc/analyzer/ChangeLog: * diagnostic-manager.cc (for_each_state_change): Use extrinsic_state::get_num_checkers rather than accessing m_checkers directly. * program-state.cc (program_state::program_state): Likewise. * program-state.h (extrinsic_state::m_checkers): Make private.
David Malcolm committed -
GCC Administrator committed
-
This test assumes that memset and strlen have been marked with __attribute__((nonnull)), which isn't necessarily the case for an arbitrary <string.h>. This likely explains these failures: FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 417) FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 418) FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 425) FAIL: gcc.dg/analyzer/malloc-1.c (test for warnings, line 429) seen in https://gcc.gnu.org/ml/gcc-testresults/2020-01/msg01608.html on x86_64-apple-darwin18. Fix it by using the __builtin_ forms. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/malloc-1.c: Remove include of <string.h>. Use __builtin_ forms of memset and strlen throughout.
David Malcolm committed -
gcc/testsuite/ChangeLog: * gcc.dg/analyzer/conditionals-2.c: Move to... * gcc.dg/analyzer/torture/conditionals-2.c: ...here, converting to a torture test. Remove redundant include.
David Malcolm committed -
PR analyzer/93356 reports an ICE handling __builtin_isnan due to a failing assertion: 674 gcc_assert (lhs_ec_id != rhs_ec_id); with op=UNORDERED_EXPR. when attempting to add an UNORDERED_EXPR constraint. This is an overzealous assertion, but underlying it are various forms of sloppiness regarding NaN within the analyzer: (a) the assumption in the constraint_manager that equivalence classes are reflexive (X == X), which isn't the case for NaN. (b) Hardcoding the "honor_nans" param to false when calling invert_tree_comparison throughout the analyzer. (c) Ignoring ORDERED_EXPR, UNORDERED_EXPR, and the UN-prefixed comparison codes. I wrote a patch for this which tracks the NaN-ness of floating-point values and uses this to address all of the above. However, to minimize changes in gcc 10 stage 4, here's a simpler patch which rejects attempts to query or add constraints on floating-point values, instead treating any floating-point comparison as "unknown", and silently dropping the constraints at edges. gcc/analyzer/ChangeLog: PR analyzer/93356 * region-model.cc (region_model::eval_condition): In both overloads, bail out immediately on floating-point types. (region_model::eval_condition_without_cm): Likewise. (region_model::add_constraint): Likewise. gcc/testsuite/ChangeLog: PR analyzer/93356 * gcc.dg/analyzer/conditionals-notrans.c (test_float_selfcmp): Add. * gcc.dg/analyzer/conditionals-trans.c: Mark floating point comparison test as failing. (test_float_selfcmp): Add. * gcc.dg/analyzer/data-model-1.c: Mark floating point comparison tests as failing. * gcc.dg/analyzer/torture/pr93356.c: New test. gcc/ChangeLog: PR analyzer/93356 * doc/analyzer.texi (Limitations): Note that constraints on floating-point values are currently ignored.
David Malcolm committed
-
- 30 Jan, 2020 22 commits
-
-
PR c/88660 * c-parser.c (c_parser_switch_statement): Make sure to request marking the switch expr as used. PR c/88660 * gcc.dg/pr88660.c: New test.
Jeff Law committed -
The following testcase FAILs on powerpc64le-linux with assembler errors, as we emit a call to bar.localalias, then .set bar.localalias, bar twice and then another call to bar.localalias. The problem is that bar.localalias can be created at various stages and e.g. ipa-pure-const can slightly adjust the original decl, so that the existing bar.localalias isn't considered usable (different flags_from_decl_or_type). In that case, we'd create another bar.localalias, which clashes with the existing name. Fixed by retrying with another name if it is already present. The various localalias aliases shouldn't be that many, from different partitions they would be lto_priv suffixed and in most cases they would already have the same type/flags/attributes. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR lto/93384 * symtab.c (symtab_node::noninterposable_alias): If localalias already exists, but is not usable, append numbers after it until a unique name is found. Formatting fix. * gcc.dg/lto/pr93384_0.c: New test. * gcc.dg/lto/pr93384_1.c: New file.
Jakub Jelinek committed -
What happens on this testcase is with the out of bounds rotate we get: Trying 13 -> 16: 13: r129:SI=r132:DI#0<-<0x20 REG_DEAD r132:DI 16: r123:DI=r129:SI<0 REG_DEAD r129:SI Successfully matched this instruction: (set (reg/v:DI 123 [ <retval> ]) (const_int 0 [0])) during combine. So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything. Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't have out of bounds counts or introduce targetm.rotate_truncation_mask, we should truncate the argument instead of punting. Punting is better for backports though. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR middle-end/93505 * combine.c (simplify_comparison) <case ROTATE>: Punt on out of range rotate counts. * gcc.c-torture/compile/pr93505.c: New test.
Jakub Jelinek committed -
When instantiating a template tsubst_copy_and_build suppresses -Wtype-limits warnings about e.g. == always being false because it might not always be false for an instantiation with other template arguments. But we should warn if the operands don't depend on template arguments. PR c++/82521 * pt.c (tsubst_copy_and_build) [EQ_EXPR]: Only suppress warnings if the expression was dependent before substitution.
Jason Merrill committed -
PR fortran/87103 * expr.c (gfc_check_conformance): Check vsnprintf for truncation. * iresolve.c (gfc_get_string): Likewise. * symbol.c (gfc_new_symbol): Remove check for maximum symbol name length. Remove redundant 0 setting of new calloc()ed gfc_symbol.
Andrew Benson committed -
Fixes ICE in testcase gcc.dg/pr81228.c 2020-01-30 Andrew Stubbs <ams@codesourcery.com> gcc/ * config/gcn/gcn.c (print_operand): Handle LTGT. * config/gcn/predicates.md (gcn_fp_compare_operator): Allow ltgt.
Andrew Stubbs committed -
* gcc.dg/tree-ssa/ssa-dse-26.c: Make existing dg-final scan conditional on !c6x. Add dg-final scan pattern for c6x.
Jeff Law committed -
gcc/testsuite/ChangeLog: * gcc.dg/Warray-bounds-57.c: New test.
Martin Sebor committed -
This wraps { ... } in _Literal (type) for consumption by the GIMPLE FE. 2020-01-30 Richard Biener <rguenther@suse.de> * tree-pretty-print.c (dump_generic_node): Wrap VECTOR_CST and CONSTRUCTOR in _Literal (type) with TDF_GIMPLE.
Richard Biener committed -
PR analyzer/93450 reports an ICE trying to fold an EQ_EXPR comparison of (int)0 with (float)Inf. Most comparisons inside the analyzer come from gimple conditions, for which the necessary casts have already been added. This one is done inside constant_svalue::eval_condition as part of purging sm-state for an unknown function call, and fails to check the types being compared, leading to the ICE. sm_state_map::set_state calls region_model::eval_condition_without_cm in order to handle pointer equality (so that e.g. (void *)&r and (foo *)&r transition together), which leads to this code generating a bogus query to see if the two constants are equal. This patch fixes the ICE in two ways: - It avoids generating comparisons within constant_svalue::eval_condition unless the types are equal (thus for constants, but not for pointer values, which are handled by region_svalue). - It updates sm_state_map::set_state to bail immediately if the new state is the same as the old one, thus avoiding the above for the common case where an svalue_id has no sm-state (such as for the int and float constants in the reproducer), for which the above becomes a no-op. gcc/analyzer/ChangeLog: PR analyzer/93450 * program-state.cc (sm_state_map::set_state): For the overload taking an svalue_id, bail out if the set_state on the ec does nothing. Convert the latter's return type from void to bool, returning true if anything changed. (sm_state_map::impl_set_state): Convert the return type from void to bool, returning true if the state changed. * program-state.h (sm_state_map::set_state): Convert return type from void to bool. (sm_state_map::impl_set_state): Likewise. * region-model.cc (constant_svalue::eval_condition): Only call fold_build2 if the types are the same. gcc/testsuite/ChangeLog: PR analyzer/93450 * gcc.dg/analyzer/torture/pr93450.c: New test.
David Malcolm committed -
2020-01-30 John David Anglin <danglin@gcc.gnu.org> * config/pa/pa.c (pa_elf_select_rtx_section): Place function pointers without a DECL in .data.rel.ro.local.
John David Anglin committed -
uaddvdi4 expander has an optimization for the low 32-bits of the 2nd input operand known to be 0. Unfortunately, in that case it only emits copying of the low 32 bits to the low 32 bits of the destination, but doesn't emit the addition with overflow detection for the high 64 bits. Well, to be precise, it emits it, but into an RTL sequence returned by gen_uaddvsi4, but that sequence isn't emitted anywhere. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR target/93494 * config/arm/arm.md (uaddvdi4): Actually emit what gen_uaddvsi4 returned. * gcc.c-torture/execute/pr93494.c: New test.
Jakub Jelinek committed -
PR bootstrap/93409 * plugin/configfrag.ac (enable_offload_targets): Skip HSA and GCN plugin besides -m32 also for -mx32. * configure: Regenerate.
Tobias Burnus committed -
PR c++/90338 * g++.dg/pr90338.C: New.
Paolo Carlini committed -
Some time ago, patterns were added to optimize move mask followed by zero extension from 32 bits to 64 bit. As the testcase shows, the intrinsics actually return int, not unsigned int, so it will happen quite often that one actually needs sign extension instead of zero extension. Except for vpmovmskb with 256-bit operand, sign vs. zero extension doesn't make a difference, as we know the bit 31 will not be set (the source will have 2 or 4 doubles, 4 or 8 floats or 16 or 32 chars). So, for the floating point patterns, this patch just uses a code iterator so that we handle both zero extend and sign extend, and for the byte one adds a separate pattern for the 128-bit operand. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR target/91824 * config/i386/sse.md (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_zext): Renamed to ... (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext): ... this. Use any_extend code iterator instead of always zero_extend. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_zext_lt): Renamed to ... (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt): ... this. Use any_extend code iterator instead of always zero_extend. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_zext_shift): Renamed to ... (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift): ... this. Use any_extend code iterator instead of always zero_extend. (*sse2_pmovmskb_ext): New define_insn. (*sse2_pmovmskb_ext_lt): New define_insn_and_split. * gcc.target/i386/pr91824-2.c: New test.
Jakub Jelinek committed -
Like any other instruction with 32-bit GPR destination operand in 64-bit mode, popcntl also clears the upper 32 bits of the register (and other bits too, it can return only 0 to 32 inclusive). During combine, the zero or sign extensions of it show up as paradoxical subreg of the popcount & 63, there 63 is the smallest power of two - 1 mask that can represent all the 0 to 32 inclusive values. 2020-01-30 Jakub Jelinek <jakub@redhat.com> PR target/91824 * config/i386/i386.md (*popcountsi2_zext): New define_insn_and_split. (*popcountsi2_zext_falsedep): New define_insn. * gcc.target/i386/pr91824-1.c: New test.
Jakub Jelinek committed -
This is something that has been discussed already a few months ago, but seems to have stalled. Here is Paul's patch from the PR except for the TREE_STATIC hunk which is wrong, and does the most conservative fn spec tweak for the problematic two builtins we are aware of (to repeat what is in the PR, both .wR and .ww are wrong for these builtins that transform one layout of an descriptor to another one; while the first pointer is properly marked that we only store to what it points to, from the second pointer we copy and reshuffle the content and store into the first one; if there wouldn't be any pointers, ".wr" would be just fine, but as there is a pointer and that pointer is copied to the area pointed by first argument, the pointer effectively leaks that way, so we e.g. can't optimize stores into what the data pointer in the descriptor points to). I haven't analyzed other fn spec attributes in the FE, but think it is better to fix at least this one we have analyzed. 2020-01-30 Paul Thomas <pault@gcc.gnu.org> Jakub Jelinek <jakub@redhat.com> PR fortran/92123 * trans-decl.c (gfc_get_symbol_decl): Call gfc_defer_symbol_init for CFI descs. (gfc_build_builtin_function_decls): Use ".w." instead of ".ww" or ".wR" for gfor_fndecl_{cfi_to_gfc,gfc_to_cfi}. (convert_CFI_desc): Handle references to CFI descriptors. Co-authored-by: Paul Thomas <pault@gcc.gnu.org>
Jakub Jelinek committed -
Commit 54b3d52c ("Emit .note.GNU-stack for hard-float linux targets.") was missing generated files. Add them now. gcc/ChangeLog: 2020-01-30 Dragan Mladjenovic <dmladjenovic@wavecomp.com> * config.in: Regenerated. * configure: Regenerated.
Dragan Mladjenovic committed -
By standard, coroutine body should be encapsulated in try-catch block as following: try { // coroutine body } catch(...) { promise.unhandled_exception(); } Given above try-catch block is implemented in the coroutine actor function called by coroutine ramp function, so the promise should be accessed via actor function's coroutine frame pointer argument, rather than the ramp function's coroutine frame variable. This patch cleans code a bit to make fix easy. gcc/cp * coroutines.cc (act_des_fn): New. (morph_fn_to_coro): Call act_des_fn to build actor/destroy decls. Access promise via actor function's frame pointer argument. (build_actor_fn, build_destroy_fn): Use frame pointer argument.
Bin Cheng committed -
Function co_await_expander expands CO_AWAIT_EXPR and inserts expanded code before result of co_await is used, however, it doesn't cover the type conversion case and leads to gimplify ICE. This patch fixes it. gcc/cp * coroutines.cc (co_await_expander): Handle type conversion case. gcc/testsuite * g++.dg/coroutines/co-await-syntax-09-convert.C: New test.
Bin Cheng committed -
Patch by Svante Signell. Updates PR go/93468 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/216959
Ian Lance Taylor committed -
Patch from Svante Signell. Updates PR go/93468 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/216958
Ian Lance Taylor committed
-