1. 04 Mar, 2019 1 commit
  2. 28 Feb, 2019 1 commit
    • AArch64: Have empty HWCAPs string ignored during native feature detection · 29c6debc
      This patch makes the feature detection code for AArch64 GCC not add features
      automatically when the feature had no hwcaps string to match against.
      
      This means that -mcpu=native no longer adds feature flags such as +profile.
      The behavior wasn't noticed before because at the time +profile was added a bug
      was preventing any feature bits from being added by native detections.
      
      The loop has also been changed as Jakub specified in order to avoid a memory
      leak that was present in the existing code and to be slightly more efficient.
      
      gcc/ChangeLog:
      
      	PR target/88530
      	* config/aarch64/aarch64-option-extensions.def: Document it.
      	* config/aarch64/driver-aarch64.c (host_detect_local_cpu): Skip feature
      	if empty hwcaps.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/88530
      	* gcc.target/aarch64/options_set_10.c: New test.
      
      From-SVN: r269276
      Tamar Christina committed
  3. 25 Feb, 2019 2 commits
    • AArch64: Fix command line options canonicalization version #2. (PR target/88530) · 4ca82fc9
      Commandline options on AArch64 don't get canonicalized into the smallest
      possible set before output to the assembler. This means that overlapping feature
      sets are emitted with superfluous parts.
      
      Normally this isn't an issue, but in the case of crypto we have retro-actively
      split it into aes and sha2. We need to emit only +crypto to the assembler
      so old assemblers continue to work.
      
      Because of how -mcpu=native and -march=native work they end up enabling all
      feature bits. Instead we need to get the smallest possible set, which would also
      fix the problem with older the assemblers and the retro-active split.
      
      The function that handles this is called quite often.  It is called for any
      push/pop options or attribute that changes arch, cpu etc.  In order to not make
      this search for the smallest set too expensive we sort the options based on the
      number of features (bits) they enable.  This allows us to process the list
      linearly instead of quadratically (Once we have enabled a feature, we know that
      anything else that enables it can be ignored.  By sorting we'll get the biggest
      groups first and thus the smallest combination of commandline flags).
      
      The Option handling structures have been extended to have a boolean to indicate
      whether the option is synthetic, with that I mean if the option flag itself
      enables a new feature.
      
      e.g. +crypto isn't an actual feature, it just enables other features, but others
      like +rdma enable multiple dependent features but is itself also a feature.
      
      There are two ways to solve this.
      
      1) Either have the options that are feature bits also turn themselves on, e.g.
         change rdma to turn on FP, SIMD and RDMA as dependency bits.
      
      2) Make a distinction between these two different type of features and have the
         framework handle it correctly.
      
      Even though it's more code I went for the second approach, as it's the one
      that'll be less fragile (people can't forget it) and gives the least surprises.
      
      Effectively this patch changes the following:
      
      The values before the => are the old compiler and after the => the new code.
      
      -march=armv8.2-a+crypto+sha2 => -march=armv8.2-a+crypto
      -march=armv8.2-a+sha2+aes => -march=armv8.2-a+crypto
      
      The remaining behaviors stay the same.
      
      gcc/ChangeLog:
      
      	PR target/88530
      	* common/config/aarch64/aarch64-common.c
      	(struct aarch64_option_extension): Add is_synthetic.
      	(all_extensions): Use it.
      	(TARGET_OPTION_INIT_STRUCT): Define hook.
      	(struct gcc_targetm_common): Moved to end.
      	(all_extensions_by_on): New.
      	(opt_ext_cmp, typedef opt_ext): New.
      	(aarch64_option_init_struct): New.
      	(aarch64_contains_opt): New.
      	(aarch64_get_extension_string_for_isa_flags): Output smallest set.
      	* config/aarch64/aarch64-option-extensions.def
      	(AARCH64_OPT_EXTENSION): Explicitly include AES and SHA2 in crypto.
      	(fp, simd, crc, lse, fp16, rcpc, rdma, dotprod, aes, sha2, sha3,
      	sm4, fp16fml, sve, profile, rng, memtag, sb, ssbs, predres):
      	Set is_synthetic to false.
      	(crypto): Set is_synthetic to true.
      	* config/aarch64/driver-aarch64.c (AARCH64_OPT_EXTENSION): Add
      	SYNTHETIC.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/88530
      	* gcc.target/aarch64/options_set_1.c: New test.
      	* gcc.target/aarch64/options_set_2.c: New test.
      	* gcc.target/aarch64/options_set_3.c: New test.
      	* gcc.target/aarch64/options_set_4.c: New test.
      	* gcc.target/aarch64/options_set_5.c: New test.
      	* gcc.target/aarch64/options_set_6.c: New test.
      	* gcc.target/aarch64/options_set_7.c: New test.
      	* gcc.target/aarch64/options_set_8.c: New test.
      	* gcc.target/aarch64/options_set_9.c: New test.
      
      From-SVN: r269193
      Tamar Christina committed
    • AArch64: Update Armv8.4-a's FP16 FML intrinsics · 9d04c986
      This patch updates the Armv8.4-a FP16 FML intrinsics's suffixes from u32 to f16
      to be more consistent with the naming convention for intrinsics.
      
      The specifications for these intrinsics have not been published yet so we do
      not need to maintain the old names.
      
      The patch was created with the following script:
      
      grep -lIE "(vfml[as].+)_u32" -r gcc/ | grep -iEv ".+Changelog.*" \
        | xargs sed -i -E -e "s/(vfml[as].+)_u32/\1_f16/g"
      
      gcc/ChangeLog:
      
      	* config/aarch64/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
      	vfmlalq_low_u32, vfmlslq_low_u32, vfmlal_high_u32, vfmlsl_high_u32,
      	vfmlalq_high_u32, vfmlslq_high_u32, vfmlal_lane_low_u32,
      	vfmlsl_lane_low_u32, vfmlal_laneq_low_u32, vfmlsl_laneq_low_u32,
      	vfmlalq_lane_low_u32, vfmlslq_lane_low_u32, vfmlalq_laneq_low_u32,
      	vfmlslq_laneq_low_u32, vfmlal_lane_high_u32, vfmlsl_lane_high_u32,
      	vfmlal_laneq_high_u32, vfmlsl_laneq_high_u32, vfmlalq_lane_high_u32,
      	vfmlslq_lane_high_u32, vfmlalq_laneq_high_u32, vfmlslq_laneq_high_u32):
      	Rename ...
      	(vfmlal_low_f16, vfmlsl_low_f16, vfmlalq_low_f16, vfmlslq_low_f16,
      	vfmlal_high_f16, vfmlsl_high_f16, vfmlalq_high_f16, vfmlslq_high_f16,
      	vfmlal_lane_low_f16, vfmlsl_lane_low_f16, vfmlal_laneq_low_f16,
      	vfmlsl_laneq_low_f16, vfmlalq_lane_low_f16, vfmlslq_lane_low_f16,
      	vfmlalq_laneq_low_f16, vfmlslq_laneq_low_f16, vfmlal_lane_high_f16,
      	vfmlsl_lane_high_f16, vfmlal_laneq_high_f16, vfmlsl_laneq_high_f16,
      	vfmlalq_lane_high_f16, vfmlslq_lane_high_f16, vfmlalq_laneq_high_f16,
      	vfmlslq_laneq_high_f16): ... To this.
      
      gcc/testsuite/ChangeLog:
      
      	* gcc.target/aarch64/fp16_fmul_high.h (test_vfmlal_high_u32,
      	test_vfmlalq_high_u32, test_vfmlsl_high_u32, test_vfmlslq_high_u32):
      	Rename ...
      	(test_vfmlal_high_f16, test_vfmlalq_high_f16, test_vfmlsl_high_f16,
      	test_vfmlslq_high_f16): ... To this.
      	* gcc.target/aarch64/fp16_fmul_lane_high.h (test_vfmlal_lane_high_u32,
      	tets_vfmlsl_lane_high_u32, test_vfmlal_laneq_high_u32,
      	test_vfmlsl_laneq_high_u32, test_vfmlalq_lane_high_u32,
      	test_vfmlslq_lane_high_u32, test_vfmlalq_laneq_high_u32,
      	test_vfmlslq_laneq_high_u32): Rename ...
      	(test_vfmlal_lane_high_f16, tets_vfmlsl_lane_high_f16,
      	test_vfmlal_laneq_high_f16, test_vfmlsl_laneq_high_f16,
      	test_vfmlalq_lane_high_f16, test_vfmlslq_lane_high_f16,
      	test_vfmlalq_laneq_high_f16, test_vfmlslq_laneq_high_f16): ... To this.
      	* gcc.target/aarch64/fp16_fmul_lane_low.h (test_vfmlal_lane_low_u32,
      	test_vfmlsl_lane_low_u32, test_vfmlal_laneq_low_u32,
      	test_vfmlsl_laneq_low_u32, test_vfmlalq_lane_low_u32,
      	test_vfmlslq_lane_low_u32, test_vfmlalq_laneq_low_u32,
      	test_vfmlslq_laneq_low_u32): Rename ...
      	(test_vfmlal_lane_low_f16, test_vfmlsl_lane_low_f16,
      	test_vfmlal_laneq_low_f16, test_vfmlsl_laneq_low_f16,
      	test_vfmlalq_lane_low_f16, test_vfmlslq_lane_low_f16,
      	test_vfmlalq_laneq_low_f16, test_vfmlslq_laneq_low_f16): ... To this.
      	* gcc.target/aarch64/fp16_fmul_low.h (test_vfmlal_low_u32,
      	test_vfmlalq_low_u32, test_vfmlsl_low_u32, test_vfmlslq_low_u32):
      	Rename ...
      	(test_vfmlal_low_f16, test_vfmlalq_low_f16, test_vfmlsl_low_f16,
      	test_vfmlslq_low_f16): ... To This.
      	* lib/target-supports.exp
      	(check_effective_target_arm_fp16fml_neon_ok_nocache): Update test.
      
      From-SVN: r269191
      Tamar Christina committed
  4. 14 Feb, 2019 1 commit
  5. 13 Feb, 2019 1 commit
    • AArch64: Allow any offset for SVE addressing modes before reload. · 0c63a8ee
      On AArch64 aarch64_classify_address has a case for when it's non-strict
      that will allow it to accept any byte offset from a reg when validating
      an address in a given addressing mode.
      
      This because reload would later make the address valid. SVE however requires
      the address always be valid, but currently allows any address when a MEM +
      offset is used.  This causes an ICE as nothing later forces the address to be
      legitimate.
      
      The patch forces aarch64_emit_sve_pred_move via expand_insn to ensure that
      the addressing mode is valid for any loads/stores it creates, which follows
      the SVE way of handling address classifications.
      
      gcc/ChangeLog:
      
      	PR target/88847
      	* config/aarch64/aarch64-sve.md (*pred_mov<mode>, pred_mov<mode>):
      	Expose as @aarch64_pred_mov.
      	* config/aarch64/aarch64.c (aarch64_classify_address):
      	Use expand_insn which legitimizes operands.
      
      gcc/testsuite/ChangeLog:
      
      	PR target/88847
      	* gcc.target/aarch64/sve/pr88847.c: New test.
      
      From-SVN: r268845
      Tamar Christina committed
  6. 07 Feb, 2019 1 commit
    • [AArch64] Change representation of SABD in RTL · 8544ed6e
      Richard raised a concern about the RTL we use to represent the AdvSIMD SABD
      (vector signed absolute difference) instruction.
      We currently represent it as ABS (MINUS op1 op2).
      
      This isn't exactly what SABD does. ABS treats its input as a signed value
      and returns the absolute of that.
      
      For example:
      (sabd:QI 64 -128) == 192 (unsigned) aka -64 (signed)
      whereas
      (minus:QI 64 -128) == 192 (unsigned) aka -64 (signed), (abs ...) of that is 64.
      
      A better way to describe the instruction is with MINUS (SMAX (op1 op2) SMIN (op1 op2)).
      This patch implements that, and also implements similar semantics for the UABD instruction
      that uses UMAX and UMIN.
      
      That way for the example above we'll have:
      (minus:QI (smax:QI (64 -128)) (smin:QI (64 -128))) == (minus:QI 64 -128) == 192 (or -64 signed) which matches
      what SABD does. 
      
      	* config/aarch64/iterators.md (max_opp): New code_attr.
      	(USMAX): New code iterator.
      	* config/aarch64/predicates.md (aarch64_smin): New predicate.
      	(aarch64_smax): Likewise.
      	* config/aarch64/aarch64-simd.md (abd<mode>_3): Rename to...
      	(*aarch64_<su>abd<mode>_3): ... Change RTL representation to
      	MINUS (MAX MIN).
      
      	* gcc.target/aarch64/abd_1.c: New test.
      	* gcc.dg/sabd_1.c: Likewise.
      
      From-SVN: r268658
      Kyrylo Tkachov committed
  7. 25 Jan, 2019 1 commit
    • This is pretty unlikely in real code... · c590597c
      This is pretty unlikely in real code, but similar to Arm, the AArch64
      ABI has a bug with the handling of 128-bit bit-fields, where if the
      bit-field dominates the overall alignment the back-end code may end up
      passing the argument correctly.  This is a regression that started in
      gcc-6 when the ABI support code was updated to support overaligned
      types.  The fix is very similar in concept to the Arm fix.  128-bit
      bit-fields are fortunately extremely rare, so I'd be very surprised if
      anyone has been bitten by this.
      
      PR target/88469
      gcc/
      	* config/aarch64/aarch64.c (aarch64_function_arg_alignment): Add new
      	argument ABI_BREAK.  Set to true if the calculated alignment has
      	changed in gcc-9.  Check bit-fields for their base type alignment.
      	(aarch64_layout_arg): Warn if argument passing has changed in gcc-9.
      	(aarch64_function_arg_boundary): Likewise.
      	(aarch64_gimplify_va_arg_expr): Likewise.
      
      gcc/testsuite/
      	* gcc.target/aarch64/aapcs64/test_align-10.c: New test.
      	* gcc.target/aarch64/aapcs64/test_align-11.c: New test.
      	* gcc.target/aarch64/aapcs64/test_align-12.c: New test.
      
      From-SVN: r268273
      Richard Earnshaw committed
  8. 17 Jan, 2019 1 commit
  9. 11 Jan, 2019 1 commit
  10. 10 Jan, 2019 4 commits
    • re PR rtl-optimization/87305 (Segfault in end_hard_regno in… · 7e4d17a8
      re PR rtl-optimization/87305 (Segfault in end_hard_regno in setup_live_pseudos_and_spill_after_risky_transforms on aarch64 big-endian)
      
      2019-01-10  Vladimir Makarov  <vmakarov@redhat.com>
      
      	PR rtl-optimization/87305
      	* lra-assigns.c
      	(setup_live_pseudos_and_spill_after_risky_transforms): Check
      	allocation for big endian pseudos used as paradoxical subregs and
      	spill them if it is wrong.
      	* lra-constraints.c (lra_constraints): Add a comment.
      
      2019-01-10  Vladimir Makarov  <vmakarov@redhat.com>
      
      	PR rtl-optimization/87305
      	* gcc.target/aarch64/pr87305.c: New.
      
      From-SVN: r267823
      Vladimir Makarov committed
    • [Committed, AArch64] Disable tests for ilp32. · 8b530f81
      Currently Return Address Signing is only supported in lp64. Thus the
      tests that I added recently (that enables return address signing by the
      mbranch-protection=standard option), should also be exempted from testing in
      ilp32. This patch adds the needed dg-require-effective-target directive in the
      tests.
      
      *** gcc/testsuite/ChangeLog ***
      
      2019-01-10  Sudakshina Das  <sudi.das@arm.com>
      
      	* gcc.target/aarch64/bti-1.c: Exempt for ilp32.
      	* gcc.target/aarch64/bti-2.c: Likewise.
      	* gcc.target/aarch64/bti-3.c: Likewise.
      
      Committed as obvious.
      
      From-SVN: r267818
      Sudakshina Das committed
    • arm-builtins.c (enum arm_type_qualifiers): Add qualifier_lane_pair_index. · c2b7062d
      2019-01-10  Tamar Christina  <tamar.christina@arm.com>
      
      	* config/arm/arm-builtins.c
      	(enum arm_type_qualifiers): Add qualifier_lane_pair_index.
      	(MAC_LANE_PAIR_QUALIFIERS): New.
      	(arm_expand_builtin_args): Use it.
      	(arm_expand_builtin_1): Likewise.
      	* config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
      	* config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
      	* config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
      	* config/arm/arm_neon.h:
      	(vcadd_rot90_f16): New.
      	(vcaddq_rot90_f16): New.
      	(vcadd_rot270_f16): New.
      	(vcaddq_rot270_f16): New.
      	(vcmla_f16): New.
      	(vcmlaq_f16): New.
      	(vcmla_lane_f16): New.
      	(vcmla_laneq_f16): New.
      	(vcmlaq_lane_f16): New.
      	(vcmlaq_laneq_f16): New.
      	(vcmla_rot90_f16): New.
      	(vcmlaq_rot90_f16): New.
      	(vcmla_rot90_lane_f16): New.
      	(vcmla_rot90_laneq_f16): New.
      	(vcmlaq_rot90_lane_f16): New.
      	(vcmlaq_rot90_laneq_f16): New.
      	(vcmla_rot180_f16): New.
      	(vcmlaq_rot180_f16): New.
      	(vcmla_rot180_lane_f16): New.
      	(vcmla_rot180_laneq_f16): New.
      	(vcmlaq_rot180_lane_f16): New.
      	(vcmlaq_rot180_laneq_f16): New.
      	(vcmla_rot270_f16): New.
      	(vcmlaq_rot270_f16): New.
      	(vcmla_rot270_lane_f16): New.
      	(vcmla_rot270_laneq_f16): New.
      	(vcmlaq_rot270_lane_f16): New.
      	(vcmlaq_rot270_laneq_f16): New.
      	(vcadd_rot90_f32): New.
      	(vcaddq_rot90_f32): New.
      	(vcadd_rot270_f32): New.
      	(vcaddq_rot270_f32): New.
      	(vcmla_f32): New.
      	(vcmlaq_f32): New.
      	(vcmla_lane_f32): New.
      	(vcmla_laneq_f32): New.
      	(vcmlaq_lane_f32): New.
      	(vcmlaq_laneq_f32): New.
      	(vcmla_rot90_f32): New.
      	(vcmlaq_rot90_f32): New.
      	(vcmla_rot90_lane_f32): New.
      	(vcmla_rot90_laneq_f32): New.
      	(vcmlaq_rot90_lane_f32): New.
      	(vcmlaq_rot90_laneq_f32): New.
      	(vcmla_rot180_f32): New.
      	(vcmlaq_rot180_f32): New.
      	(vcmla_rot180_lane_f32): New.
      	(vcmla_rot180_laneq_f32): New.
      	(vcmlaq_rot180_lane_f32): New.
      	(vcmlaq_rot180_laneq_f32): New.
      	(vcmla_rot270_f32): New.
      	(vcmlaq_rot270_f32): New.
      	(vcmla_rot270_lane_f32): New.
      	(vcmla_rot270_laneq_f32): New.
      	(vcmlaq_rot270_lane_f32): New.
      	(vcmlaq_rot270_laneq_f32): New.
      	* config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
      	vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
      	vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
      	vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
      	* config/arm/neon.md (neon_vcmla_lane<rot><mode>,
      	neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
      	* config/arm/arm.c (arm_arch8_3, arm_arch8_4): New.
      	* config/arm/arm.h (TARGET_COMPLEX, arm_arch8_3, arm_arch8_4): New.
      	(arm_option_reconfigure_globals): Use them.
      	* config/arm/iterators.md (VDF, VQ_HSF): New.
      	(VCADD, VCMLA): New.
      	(VF_constraint, rot, rotsplit1, rotsplit2): Add V4HF and V8HF.
      	* config/arm/neon.md (neon_vcadd<rot><mode>, neon_vcmla<rot><mode>): New.
      	* config/arm/unspecs.md (UNSPEC_VCADD90, UNSPEC_VCADD270,
      	UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180, UNSPEC_VCMLA270): New.
      
      gcc/testsuite/ChangeLog:
      
      2019-01-10  Tamar Christina  <tamar.christina@arm.com>
      
      	* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
      	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
      
      From-SVN: r267796
      Tamar Christina committed
    • aarch64-builtins.c (enum aarch64_type_qualifiers): Add qualifier_lane_pair_index. · 9d63f43b
      gcc/ChangeLog:
      
      2019-01-10  Tamar Christina  <tamar.christina@arm.com>
      
      	* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): Add qualifier_lane_pair_index.
      	(emit-rtl.h): Include.
      	(TYPES_QUADOP_LANE_PAIR): New.
      	(aarch64_simd_expand_args): Use it.
      	(aarch64_simd_expand_builtin): Likewise.
      	(AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_laneq_builtin_datum): New.
      	(FCMLA_LANEQ_BUILTIN, AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
      	AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_lane_builtin_data,
      	aarch64_init_fcmla_laneq_builtins, aarch64_expand_fcmla_builtin): New.
      	(aarch64_init_builtins): Add aarch64_init_fcmla_laneq_builtins.
      	(aarch64_expand_buildin): Add AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V2SF,
      	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V2SF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V2SF,
       	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ2700_V2SF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V4HF,
      	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V4HF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V4HF,
      	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ270_V4HF.
      	* config/aarch64/iterators.md (FCMLA_maybe_lane): New.
      	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add __ARM_FEATURE_COMPLEX.
      	* config/aarch64/aarch64-simd-builtins.def (fcadd90, fcadd270, fcmla0, fcmla90,
      	fcmla180, fcmla270, fcmla_lane0, fcmla_lane90, fcmla_lane180, fcmla_lane270,
      	fcmla_laneq0, fcmla_laneq90, fcmla_laneq180, fcmla_laneq270,
      	fcmlaq_lane0, fcmlaq_lane90, fcmlaq_lane180, fcmlaq_lane270): New.
      	* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane<rot><mode>,
      	aarch64_fcmla_laneq<rot>v4hf, aarch64_fcmlaq_lane<rot><mode>,aarch64_fcadd<rot><mode>,
      	aarch64_fcmla<rot><mode>): New.
      	* config/aarch64/arm_neon.h:
      	(vcadd_rot90_f16): New.
      	(vcaddq_rot90_f16): New.
      	(vcadd_rot270_f16): New.
      	(vcaddq_rot270_f16): New.
      	(vcmla_f16): New.
      	(vcmlaq_f16): New.
      	(vcmla_lane_f16): New.
      	(vcmla_laneq_f16): New.
      	(vcmlaq_lane_f16): New.
      	(vcmlaq_rot90_lane_f16): New.
      	(vcmla_rot90_laneq_f16): New.
      	(vcmla_rot90_lane_f16): New.
      	(vcmlaq_rot90_f16): New.
      	(vcmla_rot90_f16): New.
      	(vcmlaq_laneq_f16): New.
      	(vcmla_rot180_laneq_f16): New.
      	(vcmla_rot180_lane_f16): New.
      	(vcmlaq_rot180_f16): New.
      	(vcmla_rot180_f16): New.
      	(vcmlaq_rot90_laneq_f16): New.
      	(vcmlaq_rot270_laneq_f16): New.
      	(vcmlaq_rot270_lane_f16): New.
      	(vcmla_rot270_laneq_f16): New.
      	(vcmlaq_rot270_f16): New.
      	(vcmla_rot270_f16): New.
      	(vcmlaq_rot180_laneq_f16): New.
      	(vcmlaq_rot180_lane_f16): New.
      	(vcmla_rot270_lane_f16): New.
      	(vcadd_rot90_f32): New.
      	(vcaddq_rot90_f32): New.
      	(vcaddq_rot90_f64): New.
      	(vcadd_rot270_f32): New.
      	(vcaddq_rot270_f32): New.
      	(vcaddq_rot270_f64): New.
      	(vcmla_f32): New.
      	(vcmlaq_f32): New.
      	(vcmlaq_f64): New.
      	(vcmla_lane_f32): New.
      	(vcmla_laneq_f32): New.
      	(vcmlaq_lane_f32): New.
      	(vcmlaq_laneq_f32): New.
      	(vcmla_rot90_f32): New.
      	(vcmlaq_rot90_f32): New.
      	(vcmlaq_rot90_f64): New.
      	(vcmla_rot90_lane_f32): New.
      	(vcmla_rot90_laneq_f32): New.
      	(vcmlaq_rot90_lane_f32): New.
      	(vcmlaq_rot90_laneq_f32): New.
      	(vcmla_rot180_f32): New.
      	(vcmlaq_rot180_f32): New.
      	(vcmlaq_rot180_f64): New.
      	(vcmla_rot180_lane_f32): New.
      	(vcmla_rot180_laneq_f32): New.
      	(vcmlaq_rot180_lane_f32): New.
      	(vcmlaq_rot180_laneq_f32): New.
      	(vcmla_rot270_f32): New.
      	(vcmlaq_rot270_f32): New.
      	(vcmlaq_rot270_f64): New.
      	(vcmla_rot270_lane_f32): New.
      	(vcmla_rot270_laneq_f32): New.
      	(vcmlaq_rot270_lane_f32): New.
      	(vcmlaq_rot270_laneq_f32): New.
      	* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
      	* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
      	UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
      	(FCADD, FCMLA): New.
      	(rot): New.
      	* config/arm/types.md (neon_fcadd, neon_fcmla): New.
      
      gcc/testsuite/ChangeLog:
      
      2019-01-10  Tamar Christina  <tamar.christina@arm.com>
      
      	* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: New test.
      	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: New test.
      
      From-SVN: r267795
      Tamar Christina committed
  11. 09 Jan, 2019 4 commits
    • [AArch64, 6/6] Enable BTI: Add configure option. · c7ff4f0f
      This patch is part of a series that enables ARMv8.5-A in GCC and
      adds Branch Target Identification Mechanism.
      
      This patch is adding a new configure option for enabling BTI and
      Return Address Signing by default.
      
      *** gcc/ChangeLog ***
      
      2018-01-09  Sudakshina Das  <sudi.das@arm.com>
      
      	* config/aarch64/aarch64.c (aarch64_override_options): Add case to
      	check configure option to set BTI and Return Address Signing.
      	* configure.ac: Add --enable-standard-branch-protection and
      	--disable-standard-branch-protection.
      	* configure: Regenerated.
      	* doc/install.texi: Document the same.
      
      *** gcc/testsuite/ChangeLog ***
      
      2018-01-09  Sudakshina Das  <sudi.das@arm.com>
      
      	* gcc.target/aarch64/bti-1.c: Update test to not add command line
      	option when configure with bti.
      	* gcc.target/aarch64/bti-2.c: Likewise.
      	* lib/target-supports.exp
      	(check_effective_target_default_branch_protection):
      	Add configure check for --enable-standard-branch-protection.
      
      From-SVN: r267770
      Sudakshina Das committed
    • [AArch64, 5/6] Enable BTI : Add new pass for BTI. · b5f794b4
      This patch is part of a series that enables ARMv8.5-A in GCC and
      adds Branch Target Identification Mechanism.
      
      This patch adds a new pass called "bti" which is triggered by the command
      line argument -mbranch-protection whenever "bti" is turned on.
      
      The pass iterates through the instructions and adds appropriated BTI
      instructions based on the following:
        * Add a new "BTI C" at the beginning of a function, unless its already
          protected by a "PACIASP". We exempt the functions that are only called
          directly.
        * Add a new "BTI J" for every target of an indirect jump, jump table
          targets, non-local goto targets or labels that might be referenced by
          variables, constant pools, etc (NOTE_INSN_DELETED_LABEL).
      
      Since we have already changed the use of indirect tail calls to only x16 and
      x17, we do not have to use "BTI JC".
      (check patch 3/6).
      
      *** gcc/ChangeLog ***
      
      2018-01-09  Sudakshina Das  <sudi.das@arm.com>
      	    Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
      
      	* config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
      	* gcc/config/aarch64/aarch64.h: Update comment for TRAMPOLINE_SIZE.
      	* config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Update
      	if bti is enabled.
      	* config/aarch64/aarch64-bti-insert.c: New file.
      	* config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert bti
      	pass.
      	* config/aarch64/aarch64-protos.h (make_pass_insert_bti): Declare the
      	new bti pass.
      	* config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG,
      	UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC.
      	(bti_noarg, bti_j, bti_c, bti_jc): New define_insns.
      	* config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
      
      *** gcc/testsuite/ChangeLog ***
      
      2018-01-09  Sudakshina Das  <sudi.das@arm.com>
      
      	* gcc.target/aarch64/bti-1.c: New test.
      	* gcc.target/aarch64/bti-2.c: New test.
      	* gcc.target/aarch64/bti-3.c: New test.
      	* lib/target-supports.exp
      	(check_effective_target_aarch64_bti_hw): Add new check for BTI hw.
      
      Co-Authored-By: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
      
      From-SVN: r267769
      Sudakshina Das committed
    • [AArch64, 3/6] Restrict indirect tail calls to x16 and x17 · 901e66e0
      This patch is part of a series that enables ARMv8.5-A in GCC and
      adds Branch Target Identification Mechanism.
      
      This patch changes the registers that are allowed for indirect tail calls.
      We are choosing to restrict these to only x16 or x17.
      
      Indirect tail calls are special in a way that they convert a call statement
      (BLR instruction) to a jump statement (BR instruction). For the best possible
      use of Branch Target Identification Mechanism, we would like to place a
      "BTI C" (call) at the beginning of the function which is only
      compatible with BLRs and BR X16/X17. In order to make indirect tail calls
      compatible with this scenario, we are restricting the TAILCALL_ADDR_REGS.
      
      In order to use x16/x17 for this purpose, we also had to change the use
      of these registers in the epilogue/prologue handling. For this purpose
      we are now using x12 and x13 named as EP0_REGNUM and EP1_REGNUM as
      scratch registers for epilogue and prologue.
      
      *** gcc/ChangeLog***
      
      2018-01-09  Sudakshina Das  <sudi.das@arm.com>
      
      	* config/aarch64/aarch64.c (aarch64_expand_prologue): Use new
      	epilogue/prologue scratch registers EP0_REGNUM and EP1_REGNUM.
      	(aarch64_expand_epilogue): Likewise.
      	(aarch64_output_mi_thunk): Likewise
      	* config/aarch64/aarch64.h (REG_CLASS_CONTENTS): Change
      	TAILCALL_ADDR_REGS to x16 and x17.
      	* config/aarch64/aarch64.md: Define EP0_REGNUM and EP1_REGNUM.
      
      *** gcc/testsuite/ChangeLog ***
      
      2018-01-09  Sudakshina Das  <sudi.das@arm.com>
      
      	* gcc.target/aarch64/test_frame_17.c: Update to check for EP0_REGNUM
      	instead of IP0_REGNUM and add test case.
      
      From-SVN: r267767
      Sudakshina Das committed
    • [Aarch64][SVE] Add copysign and xorsign support · 6c9c7b73
      This patch adds support for copysign and xorsign builtins to SVE. With the new
      expands, they can be vectorized using bitwise logical operations.
      
      I tested this patch in an aarch64 machine bootstrapping the compiler and
      running the checks.
      
      2019-01-09  Alejandro Martinez  <alejandro.martinezvicente@arm.com>
      
      	* config/aarch64/aarch64-sve.md (copysign<mode>3): New define_expand.
      	(xorsign<mode>3): Likewise.
      
      2019-01-09  Alejandro Martinez  <alejandro.martinezvicente@arm.com>
      
      	* gcc.target/aarch64/sve/copysign_1.c: New test for SVE vectorized
      	copysign.
      	* gcc.target/aarch64/sve/copysign_1_run.c: Likewise.
      	* gcc.target/aarch64/sve/xorsign_1.c: New test for SVE vectorized
      	xorsign.
      	* gcc.target/aarch64/sve/xorsign_1_run.c: Likewise.
      
      From-SVN: r267764
      Alejandro Martinez committed
  12. 08 Jan, 2019 1 commit
    • [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI · efac62a3
      gcc/ChangeLog:
      
      2019-01-08  Sam Tebbs  <sam.tebbs@arm.com>
      
      	* config/aarch64/aarch64.c (BRANCH_PROTECT_STR_MAX,
      	aarch64_parse_branch_protection,
      	struct aarch64_branch_protect_type,
      	aarch64_handle_no_branch_protection,
      	aarch64_handle_standard_branch_protection,
      	aarch64_validate_mbranch_protection,
      	aarch64_handle_pac_ret_protection,
      	aarch64_handle_attr_branch_protection,
      	accepted_branch_protection_string,
      	aarch64_pac_ret_subtypes,
      	aarch64_branch_protect_types,
      	aarch64_handle_pac_ret_leaf): Define.
      	(aarch64_override_options_after_change_1, aarch64_override_options):
      	Add check for accepted_branch_protection_string.
      	(aarch64_option_save): Save accepted_branch_protection_string.
      	(aarch64_option_restore): Save accepted_branch_protection_string.
      	* config/aarch64/aarch64.c (aarch64_attributes): Add branch-protection.
      	* config/aarch64/aarch64.opt: Add mbranch-protection. Deprecate
      	msign-return-address.
      	* doc/invoke.texi: Add mbranch-protection.
      
      gcc/testsuite/Changelog:
      
      2019-01-08  Sam Tebbs  <sam.tebbs@arm.com>
      
      	* gcc.target/aarch64/(return_address_sign_1.c,
      	return_address_sign_2.c, return_address_sign_3.c (__attribute__)):
      	Change option to -mbranch-protection.
      	* gcc.target/aarch64/(branch-protection-option.c,
      	branch-protection-option-2.c, branch-protection-attr.c,
      	branch-protection-attr-2.c): New file.
      
      From-SVN: r267717
      Sam Tebbs committed
  13. 07 Jan, 2019 1 commit
    • Investigating PR target/86891 revealed a number of issues with the way the... · a58fe3c5
      Investigating PR target/86891 revealed a number of issues with the way
      the AArch64 backend was handing overflow detection patterns.  Firstly,
      expansion for signed and unsigned types is not the same as in one form
      the overflow is detected via the C flag and in the other it is done
      via the V flag in the PSR.  Secondly, particular care has to be taken
      when describing overflow of signed types: the comparison has to be
      performed conceptually on a value that cannot overflow and compared to
      a value that might have overflowed.
      
      It became apparent that some of the patterns were simply unmatchable
      (they collapse to NEG in the RTL rather than subtracting from zero)
      and a number of patterns were overly restrictive in terms of the
      immediate constants that they supported.  I've tried to address all of
      these issues as well.
      
      gcc:
      
      	PR target/86891
      	* config/aarch64/aarch64.c (aarch64_expand_subvti): New parameter
      	unsigned_p.  Handle signed and unsigned overflow correction as
      	required.
      	* config/aarch64/aarch64-protos.h (aarch64_expand_subvti): Update
      	prototype.
      	* config/aarch64/aarch64.md (addv<mode>4): Use aarch64_plus_operand
      	for operand 2.
      	(add<mode>3_compareV_imm): Make this callable for expanding.
      	(subv<GPI:mode>4): Use register_operand for operand 1.  Use
      	aarch64_plus_operand for operand 2.
      	(subv<GPI:mode>_insn): New insn pattern.
      	(subv<GPI:mode>_imm): Likewise.
      	(negv<GPI:mode>3): New expand pattern.
      	(negv<GPI:mode>_insn): New insn pattern.
      	(negv<GPI:mode>_cmp_only): Likewise.
      	(cmpv<GPI:mode>_insn): Likewise.
      	(subvti4): Use register_operand for operand 1.  Update call to
      	aarch64_expand_subvti.
      	(usubvti4): Likewise.
      	(negvti3): New expand pattern.
      	(negdi_carryout): New insn pattern.
      	(negvdi_carryinV): New insn pattern.
      	(sub<mode3>_compare1_imm): Delete named insn pattern, make anonymous
      	version the named version.
      	(peepholes to convert to sub<mode3>_compare1_imm): Adjust order of
      	operands.
      	(usub<GPI:mode>3_carryinC, usub<GPI:mode>3_carryinC_z1): New insn
      	patterns.
      	(usub<GPI:mode>3_carryinC_z2, usub<GPI:mode>3_carryinC): New insn
      	patterns.
      	(sub<mode>3_carryinCV, sub<mode>3_carryinCV_z1_z2): Delete.
      	(sub<mode>3_carryinCV_z1, sub<mode>3_carryinCV_z2): Delete.
      	(sub<mode>3_carryinCV): Delete.
      	(sub<GPI:mode>3_carryinV): New expand pattern.
      	sub<mode>3_carryinV, sub<mode>3_carryinV_z2): New insn patterns.
      
      testsuite:
      
      	* gcc.target/aarch64/subs_compare_2.c: Make '#' immediate prefix
      	optional in scan pattern.
      
      From-SVN: r267650
      Richard Earnshaw committed
  14. 04 Jan, 2019 1 commit
  15. 01 Jan, 2019 1 commit
  16. 20 Dec, 2018 2 commits
    • [AArch64][SVE] Add ABS support · 69c5fdcf
      For some reason we missed ABS out of the list of supported integer
      operations when adding the SVE port initially.
      
      2018-12-20  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/iterators.md (SVE_INT_UNARY, fp_int_op): Add abs.
      	(SVE_FP_UNARY): Sort.
      
      gcc/testsuite/
      	* gcc.target/aarch64/pr64946.c: Force nosve.
      	* gcc.target/aarch64/ssadv16qi.c: Likewise.
      	* gcc.target/aarch64/usadv16qi.c: Likewise.
      	* gcc.target/aarch64/vect-abs-compile.c: Likewise.
      	* gcc.target/aarch64/sve/abs_1.c: New test.
      
      From-SVN: r267304
      Richard Sandiford committed
    • [AArch64][SVE] Fix IFN_COND_FMLA movprfx alternative · 7abc36cc
      This patch fixes a cut-&-pasto in the (match_dup 4) version of
      "cond_<SVE_COND_FP_TERNARY:optab><SVE_F:mode>".  (It's a shame
      that there's so much cut-&-paste in these patterns, but it's hard
      to avoid without more infrastructure.)
      
      2018-12-20  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_4): Use
      	sve_fmla_op rather than sve_fmad_op for the movprfx alternative.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/fmla_2.c: New test.
      	* gcc.target/aarch64/sve/fmla_2_run.c: Likewise
      
      From-SVN: r267303
      Richard Sandiford committed
  17. 17 Dec, 2018 1 commit
    • aarch64-torture.exp: New file. · ba1a78ff
      2018-12-17  Steve Ellcey  <sellcey@cavium.com>
      
      	* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
      	* gcc.target/aarch64/torture/simd-abi-1.c: New test.
      	* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
      	* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
      	* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.
      	* gcc.target/aarch64/torture/simd-abi-5.c: Ditto.
      	* gcc.target/aarch64/torture/simd-abi-6.c: Ditto.
      	* gcc.target/aarch64/torture/simd-abi-7.c: Ditto.
      
      From-SVN: r267209
      Steve Ellcey committed
  18. 07 Dec, 2018 3 commits
    • [AArch64][2/2] Add sve_width -moverride tunable · 886f092f
      On top of the previous patch that implements TARGET_ESTIMATED_POLY_VALUE
      and adds an sve_width tuning field to the CPU structs, this patch implements
      an -moverride knob to adjust this sve_width field to allow for experimentation.
      Again, reminder that this only has an effect when compiling for VLA-SVE that is,
      without msve-vector-bits=<foo>. This just adjusts tuning heuristics in the compiler,,
      like profitability thresholds for vectorised versioned loops, and others.
      
      It can be used, for example like -moverride=sve_width=256 to set the sve_width
      tuning field to 256. Widths outside of the accepted SVE widths [128 - 2048] are rejected
      as you'd expect.
      
          * config/aarch64/aarch64.c (aarch64_tuning_override_functions): Add
          sve_width entry.
          (aarch64_parse_sve_width_string): Define.
      
      
          * gcc.target/aarch64/sve/override_sve_width_1.c: New test.
      
      From-SVN: r266898
      Kyrylo Tkachov committed
    • [AArch64][SVE] Remove unnecessary PTRUEs from integer arithmetic · 26004f51
      When using the unpredicated immediate forms of MUL, LSL, LSR and ASR,
      the rtl patterns would still have the predicate operand we created for
      the other forms.  This patch splits the patterns after reload in order
      to get rid of the predicate, like we already do for WHILE.
      
      2018-12-07  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/aarch64-sve.md (*mul<mode>3, *v<optab><mode>3):
      	Split the patterns after reload if we don't need the predicate
      	operand.
      	(*post_ra_mul<mode>3, *post_ra_v<optab><mode>3): New patterns.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/pred_elim_2.c: New test.
      
      From-SVN: r266892
      Richard Sandiford committed
    • [AArch64][SVE] Remove unnecessary PTRUEs from FP arithmetic · 740c1ed7
      When using the unpredicated all-register forms of FADD, FSUB and FMUL,
      the rtl patterns would still have the predicate operand we created for
      the other forms.  This patch splits the patterns after reload in order
      to get rid of the predicate, like we already do for WHILE.
      
      2018-12-07  Richard Sandiford  <richard.sandiford@arm.com>
      
      gcc/
      	* config/aarch64/iterators.md (SVE_UNPRED_FP_BINARY): New code
      	iterator.
      	(sve_fp_op): Handle minus and mult.
      	* config/aarch64/aarch64-sve.md (*add<mode>3, *sub<mode>3)
      	(*mul<mode>3): Split the patterns after reload if we don't
      	need the predicate operand.
      	(*post_ra_<sve_fp_op><mode>3): New pattern.
      
      gcc/testsuite/
      	* gcc.target/aarch64/sve/pred_elim_1.c: New test.
      
      From-SVN: r266891
      Richard Sandiford committed
  19. 06 Dec, 2018 1 commit
  20. 29 Nov, 2018 1 commit
    • PR c/88172 - attribute aligned of zero silently accepted but ignored · 673670da
      PR c/88172 - attribute aligned of zero silently accepted but ignored
      PR testsuite/88208 - new test case c-c++-common/builtin-has-attribute-3.c in r266335 has multiple excess errors
      
      gcc/ChangeLog:
      
      	PR c/88172
      	PR testsuite/88208
      	* doc/extend.texi (attribute constructor): Clarify.
      
      gcc/c/ChangeLog:
      
      	PR c/88172
      	PR testsuite/88208
      	* c-decl.c (declspec_add_alignas): Adjust call to check_user_alignment.
      
      gcc/c-family/ChangeLog:
      
      	PR c/88172
      	PR testsuite/88208
      	* c-attribs.c (common_handle_aligned_attribute): Silently avoid setting
      	alignments to values less than the target requires.
      	(has_attribute): For attribute aligned consider both the attribute
      	and the alignment bits.
      	* c-common.c (c_init_attributes): Optionally issue a warning for
      	zero alignment.
      
      gcc/testsuite/ChangeLog:
      
      	PR c/88172
      	PR testsuite/88208
      	* gcc.dg/attr-aligned-2.c: New test.
      	* gcc.dg/builtin-has-attribute.c: Adjust.
      	* c-c++-common/builtin-has-attribute-2.c: Same.
      	* c-c++-common/builtin-has-attribute-3.c: Same.
      	* c-c++-common/builtin-has-attribute-4.c: Same.
      	* c-c++-common/builtin-has-attribute-5.c: New test.
      	* gcc.target/aarch64/attr-aligned.c: Same.
      	* gcc.target/i386/attr-aligned.c: Same.
      	* gcc.target/powerpc/attr-aligned.c: Same.
      	* gcc.target/sparc/attr-aligned.c: Same.
      
      From-SVN: r266633
      Martin Sebor committed
  21. 21 Nov, 2018 1 commit
  22. 19 Nov, 2018 1 commit
  23. 15 Nov, 2018 1 commit
  24. 14 Nov, 2018 1 commit
    • [AArch64] Fix PR62178 testcase failures · ff4d8480
      The testcase for PR62178 has been failing for a while due to the pass
      conditions being too tight, resulting in failures with -mcmodel=tiny:
      
      	ldr	q2, [x0], 124
      	ld1r	{v1.4s}, [x1], 4
      	cmp	x0, x2
      	mla	v0.4s, v2.4s, v1.4s
      	bne	.L7
      
      -mcmodel=small generates the slightly different:
      
      	ldr	q1, [x0], 124
      	ldr	s2, [x1, 4]!
      	cmp	x0, x2
      	mla	v0.4s, v1.4s, v2.s[0]
      	bne	.L7
      
      This is due to Combine merging a DUP instruction with either a load
      or MLA - we can't force it to prefer one over the other.  However the
      generated vector loop is fast either way since it generates MLA and
      merges the DUP either with a load or MLA.  So relax the conditions
      slightly and check we still generate MLA and there is no DUP or FMOV.
      
      The testcase now passes - committed as obvious.
      
          testsuite/
      	* gcc.target/aarch64/pr62178.c: Relax scan-assembler checks.
      
      From-SVN: r266139
      Wilco Dijkstra committed
  25. 12 Nov, 2018 2 commits
    • re PR target/86677 (popcount builtin detection is breaking some kernel build) · 06a6b46a
      gcc/ChangeLog:
      
      2018-11-13  Kugan Vivekanandarajah  <kuganv@linaro.org>
      
      	PR middle-end/86677
      	PR middle-end/87528
      	* tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
      	as expensive when backend does not define it.
      
      gcc/testsuite/ChangeLog:
      
      2018-11-13  Kugan Vivekanandarajah  <kuganv@linaro.org>
      
      	PR middle-end/86677
      	PR middle-end/87528
      	* g++.dg/tree-ssa/pr86544.C: Run only for target supporting popcount
      	pattern.
      	* gcc.dg/tree-ssa/popcount.c: Likewise.
      	* gcc.dg/tree-ssa/popcount2.c: Likewise.
      	* gcc.dg/tree-ssa/popcount3.c: Likewise.
      	* gcc.target/aarch64/popcount4.c: New test.
      	* lib/target-supports.exp (check_effective_target_popcountl): New.
      
      From-SVN: r266039
      Kugan Vivekanandarajah committed
    • [PR87815]Don't generate shift sequence for load replacement in DSE when the mode… · e6575643
      [PR87815]Don't generate shift sequence for load replacement in DSE when the mode size is not compile-time constant
      
      The patch adds a check if the gap is compile-time constant.
      
      This happens when dse decides to replace the load with previous store value.
      The problem is that, shift sequence could not accept compile-time non-constant
      mode operand.
      
      gcc/
      
      2018-11-12  Renlin Li  <renlin.li@arm.com>
      
      	PR target/87815
      	* dse.c (get_stored_val): Add check for compile-time
      	constantness of gap.
      
      gcc/testsuite/
      
      2018-11-12  Renlin Li  <renlin.li@arm.com>
      
      	PR target/87815
      	* gcc.target/aarch64/sve/pr87815.c: New.
      
      From-SVN: r266033
      Renlin Li committed
  26. 31 Oct, 2018 1 commit
    • Provide extension hint for aarch64 target (PR driver/83193). · c7887347
      2018-10-31  Martin Liska  <mliska@suse.cz>
      
      	PR driver/83193
      	* common/config/aarch64/aarch64-common.c (aarch64_parse_extension):
      	Add new argument invalid_extension.
      	(aarch64_get_all_extension_candidates): New function.
      	(aarch64_rewrite_selected_cpu): Add NULL to function call.
      	* config/aarch64/aarch64-protos.h (aarch64_parse_extension): Add
      	new argument.
      	(aarch64_get_all_extension_candidates): New function.
      	* config/aarch64/aarch64.c (aarch64_parse_arch): Add new
      	argument invalid_extension.
      	(aarch64_parse_cpu): Likewise.
      	(aarch64_print_hint_for_extensions): New function.
      	(aarch64_validate_mcpu): Provide hint about invalid extension.
      	(aarch64_validate_march): Likewise.
      	(aarch64_handle_attr_arch): Pass new argument.
      	(aarch64_handle_attr_cpu): Provide hint about invalid extension.
      	(aarch64_handle_attr_isa_flags): Likewise.
      2018-10-31  Martin Liska  <mliska@suse.cz>
      
      	PR driver/83193
      	* gcc.target/aarch64/spellcheck_7.c: New test.
      	* gcc.target/aarch64/spellcheck_8.c: New test.
      	* gcc.target/aarch64/spellcheck_9.c: New test.
      
      From-SVN: r265686
      Martin Liska committed
  27. 15 Oct, 2018 1 commit
    • [PR87563][AARCH64-SVE]: Don't keep ifcvt loop when COND_<OP> ifn could not be vectorized. · 41241199
      ifcvt will created versioned loop and it will permissively generate
      scalar COND_<OP> ifn.
      
      If in the loop vectorize pass, COND_<OP> could not get vectoized,
      the if-converted loop should be abandoned when the target doesn't support
      such ifn.
      
      
      gcc/
      
      2018-10-12  Renlin Li  <renlin.li@arm.com>
      
      	PR target/87563
      	* tree-vectorizer.c (try_vectorize_loop_1): Don't use
      	if-conversioned loop when it contains ifn with types not
      	supported by backend.
      	* internal-fn.c (expand_direct_optab_fn): Add an assert.
      	(direct_internal_fn_supported_p): New helper function.
      	* internal-fn.h (direct_internal_fn_supported_p): Declare.
      
      gcc/testsuite/
      
      2018-10-12  Renlin Li  <renlin.li@arm.com>
      
      	PR target/87563
      	* gcc.target/aarch64/sve/pr87563.c: New.
      
      From-SVN: r265172
      Renlin Li committed
  28. 12 Oct, 2018 1 commit
    • [AArch64] Support zero-extended move to FP register · 0cfc095c
      The popcount expansion uses SIMD instructions acting on 64-bit values.
      As a result a popcount of a 32-bit integer requires zero-extension before 
      moving the zero-extended value into an FP register.  This patch adds
      support for zero-extended int->FP moves to avoid the redundant uxtw.
      Similarly, add support for 32-bit zero-extending load->FP register
      and 32-bit zero-extending FP->FP and FP->int moves.
      Add a missing 'fp' arch attribute to the related 8/16-bit pattern and
      fix an incorrect type attribute.
      
      To complete zero-extended load support, add a new alternative to 
      load_pair_zero_extendsidi2_aarch64 to support LDP into FP registers too.
      
      int f (int a)
      {
        return __builtin_popcount (a);
      }
      
      Before:
      	uxtw	x0, w0
      	fmov	d0, x0
      	cnt	v0.8b, v0.8b
      	addv	b0, v0.8b
      	fmov	w0, s0
      	ret
      
      After:
      	fmov	s0, w0
      	cnt	v0.8b, v0.8b
      	addv	b0, v0.8b
      	fmov	w0, s0
      	ret
      
      Passes regress & bootstrap on AArch64.
      
          gcc/
      	* config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Add alternatives
      	to zero-extend between int and floating-point registers.
      	(load_pair_zero_extendsidi2_aarch64): Add alternative for zero-extended
      	ldp into floating-point registers.  Add type and arch attributes.
      	(zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add arch attribute.
      	Use f_loads for type attribute.
      
          testsuite/
      	* gcc.target/aarch64/popcnt.c: Test zero-extended popcount.
      	* gcc.target/aarch64/vec_zeroextend.c: Test zero-extended vectors.
      
      From-SVN: r265079
      Wilco Dijkstra committed
  29. 11 Oct, 2018 1 commit
    • [AArch64] Fix PR87511 · 1b6acf23
      As mentioned in PR87511, the shift used in aarch64_mask_and_shift_for_ubfiz_p
      should be evaluated as a HOST_WIDE_INT rather than int.
      
      Passes bootstrap & regress.
      
          gcc/
      	PR target/87511
      	* config/aarch64/aarch64.c (aarch64_mask_and_shift_for_ubfiz_p):
      	Use HOST_WIDE_INT_1U for shift.
      
          testsuite/
      	PR target/87511
      	* gcc.target/aarch64/pr87511.c: Add new test.
      
      From-SVN: r265058
      Wilco Dijkstra committed