Commits · ed3ea9f271aecfdea1650c87f835efa70a6f1bac · lvzhengyang / riscv-gcc-1

04 Mar, 2019 1 commit

AArch64: Make test options_set_10.c not run on native. · 664f5404

The test options_set_10.c shouldn't run when cross compiled.
In addition to gating it on linux I'm also gating it on native now.

gcc/testsuite/ChangeLog:

	PR target/88530
	* gcc.target/aarch64/options_set_10.c:

From-SVN: r269366

committed 6 years ago

664f5404 Browse Directory

28 Feb, 2019 1 commit

AArch64: Have empty HWCAPs string ignored during native feature detection · 29c6debc

This patch makes the feature detection code for AArch64 GCC not add features
automatically when the feature had no hwcaps string to match against.

This means that -mcpu=native no longer adds feature flags such as +profile.
The behavior wasn't noticed before because at the time +profile was added a bug
was preventing any feature bits from being added by native detections.

The loop has also been changed as Jakub specified in order to avoid a memory
leak that was present in the existing code and to be slightly more efficient.

gcc/ChangeLog:

	PR target/88530
	* config/aarch64/aarch64-option-extensions.def: Document it.
	* config/aarch64/driver-aarch64.c (host_detect_local_cpu): Skip feature
	if empty hwcaps.

gcc/testsuite/ChangeLog:

	PR target/88530
	* gcc.target/aarch64/options_set_10.c: New test.

From-SVN: r269276

committed 6 years ago

29c6debc Browse Directory

25 Feb, 2019 2 commits

AArch64: Fix command line options canonicalization version #2. (PR target/88530) · 4ca82fc9

Commandline options on AArch64 don't get canonicalized into the smallest
possible set before output to the assembler. This means that overlapping feature
sets are emitted with superfluous parts.

Normally this isn't an issue, but in the case of crypto we have retro-actively
split it into aes and sha2. We need to emit only +crypto to the assembler
so old assemblers continue to work.

Because of how -mcpu=native and -march=native work they end up enabling all
feature bits. Instead we need to get the smallest possible set, which would also
fix the problem with older the assemblers and the retro-active split.

The function that handles this is called quite often.  It is called for any
push/pop options or attribute that changes arch, cpu etc.  In order to not make
this search for the smallest set too expensive we sort the options based on the
number of features (bits) they enable.  This allows us to process the list
linearly instead of quadratically (Once we have enabled a feature, we know that
anything else that enables it can be ignored.  By sorting we'll get the biggest
groups first and thus the smallest combination of commandline flags).

The Option handling structures have been extended to have a boolean to indicate
whether the option is synthetic, with that I mean if the option flag itself
enables a new feature.

e.g. +crypto isn't an actual feature, it just enables other features, but others
like +rdma enable multiple dependent features but is itself also a feature.

There are two ways to solve this.

1) Either have the options that are feature bits also turn themselves on, e.g.
   change rdma to turn on FP, SIMD and RDMA as dependency bits.

2) Make a distinction between these two different type of features and have the
   framework handle it correctly.

Even though it's more code I went for the second approach, as it's the one
that'll be less fragile (people can't forget it) and gives the least surprises.

Effectively this patch changes the following:

The values before the => are the old compiler and after the => the new code.

-march=armv8.2-a+crypto+sha2 => -march=armv8.2-a+crypto
-march=armv8.2-a+sha2+aes => -march=armv8.2-a+crypto

The remaining behaviors stay the same.

gcc/ChangeLog:

	PR target/88530
	* common/config/aarch64/aarch64-common.c
	(struct aarch64_option_extension): Add is_synthetic.
	(all_extensions): Use it.
	(TARGET_OPTION_INIT_STRUCT): Define hook.
	(struct gcc_targetm_common): Moved to end.
	(all_extensions_by_on): New.
	(opt_ext_cmp, typedef opt_ext): New.
	(aarch64_option_init_struct): New.
	(aarch64_contains_opt): New.
	(aarch64_get_extension_string_for_isa_flags): Output smallest set.
	* config/aarch64/aarch64-option-extensions.def
	(AARCH64_OPT_EXTENSION): Explicitly include AES and SHA2 in crypto.
	(fp, simd, crc, lse, fp16, rcpc, rdma, dotprod, aes, sha2, sha3,
	sm4, fp16fml, sve, profile, rng, memtag, sb, ssbs, predres):
	Set is_synthetic to false.
	(crypto): Set is_synthetic to true.
	* config/aarch64/driver-aarch64.c (AARCH64_OPT_EXTENSION): Add
	SYNTHETIC.

gcc/testsuite/ChangeLog:

	PR target/88530
	* gcc.target/aarch64/options_set_1.c: New test.
	* gcc.target/aarch64/options_set_2.c: New test.
	* gcc.target/aarch64/options_set_3.c: New test.
	* gcc.target/aarch64/options_set_4.c: New test.
	* gcc.target/aarch64/options_set_5.c: New test.
	* gcc.target/aarch64/options_set_6.c: New test.
	* gcc.target/aarch64/options_set_7.c: New test.
	* gcc.target/aarch64/options_set_8.c: New test.
	* gcc.target/aarch64/options_set_9.c: New test.

From-SVN: r269193

committed 6 years ago

4ca82fc9 Browse Directory

AArch64: Update Armv8.4-a's FP16 FML intrinsics · 9d04c986

This patch updates the Armv8.4-a FP16 FML intrinsics's suffixes from u32 to f16
to be more consistent with the naming convention for intrinsics.

The specifications for these intrinsics have not been published yet so we do
not need to maintain the old names.

The patch was created with the following script:

grep -lIE "(vfml[as].+)_u32" -r gcc/ | grep -iEv ".+Changelog.*" \
  | xargs sed -i -E -e "s/(vfml[as].+)_u32/\1_f16/g"

gcc/ChangeLog:

	* config/aarch64/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
	vfmlalq_low_u32, vfmlslq_low_u32, vfmlal_high_u32, vfmlsl_high_u32,
	vfmlalq_high_u32, vfmlslq_high_u32, vfmlal_lane_low_u32,
	vfmlsl_lane_low_u32, vfmlal_laneq_low_u32, vfmlsl_laneq_low_u32,
	vfmlalq_lane_low_u32, vfmlslq_lane_low_u32, vfmlalq_laneq_low_u32,
	vfmlslq_laneq_low_u32, vfmlal_lane_high_u32, vfmlsl_lane_high_u32,
	vfmlal_laneq_high_u32, vfmlsl_laneq_high_u32, vfmlalq_lane_high_u32,
	vfmlslq_lane_high_u32, vfmlalq_laneq_high_u32, vfmlslq_laneq_high_u32):
	Rename ...
	(vfmlal_low_f16, vfmlsl_low_f16, vfmlalq_low_f16, vfmlslq_low_f16,
	vfmlal_high_f16, vfmlsl_high_f16, vfmlalq_high_f16, vfmlslq_high_f16,
	vfmlal_lane_low_f16, vfmlsl_lane_low_f16, vfmlal_laneq_low_f16,
	vfmlsl_laneq_low_f16, vfmlalq_lane_low_f16, vfmlslq_lane_low_f16,
	vfmlalq_laneq_low_f16, vfmlslq_laneq_low_f16, vfmlal_lane_high_f16,
	vfmlsl_lane_high_f16, vfmlal_laneq_high_f16, vfmlsl_laneq_high_f16,
	vfmlalq_lane_high_f16, vfmlslq_lane_high_f16, vfmlalq_laneq_high_f16,
	vfmlslq_laneq_high_f16): ... To this.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/fp16_fmul_high.h (test_vfmlal_high_u32,
	test_vfmlalq_high_u32, test_vfmlsl_high_u32, test_vfmlslq_high_u32):
	Rename ...
	(test_vfmlal_high_f16, test_vfmlalq_high_f16, test_vfmlsl_high_f16,
	test_vfmlslq_high_f16): ... To this.
	* gcc.target/aarch64/fp16_fmul_lane_high.h (test_vfmlal_lane_high_u32,
	tets_vfmlsl_lane_high_u32, test_vfmlal_laneq_high_u32,
	test_vfmlsl_laneq_high_u32, test_vfmlalq_lane_high_u32,
	test_vfmlslq_lane_high_u32, test_vfmlalq_laneq_high_u32,
	test_vfmlslq_laneq_high_u32): Rename ...
	(test_vfmlal_lane_high_f16, tets_vfmlsl_lane_high_f16,
	test_vfmlal_laneq_high_f16, test_vfmlsl_laneq_high_f16,
	test_vfmlalq_lane_high_f16, test_vfmlslq_lane_high_f16,
	test_vfmlalq_laneq_high_f16, test_vfmlslq_laneq_high_f16): ... To this.
	* gcc.target/aarch64/fp16_fmul_lane_low.h (test_vfmlal_lane_low_u32,
	test_vfmlsl_lane_low_u32, test_vfmlal_laneq_low_u32,
	test_vfmlsl_laneq_low_u32, test_vfmlalq_lane_low_u32,
	test_vfmlslq_lane_low_u32, test_vfmlalq_laneq_low_u32,
	test_vfmlslq_laneq_low_u32): Rename ...
	(test_vfmlal_lane_low_f16, test_vfmlsl_lane_low_f16,
	test_vfmlal_laneq_low_f16, test_vfmlsl_laneq_low_f16,
	test_vfmlalq_lane_low_f16, test_vfmlslq_lane_low_f16,
	test_vfmlalq_laneq_low_f16, test_vfmlslq_laneq_low_f16): ... To this.
	* gcc.target/aarch64/fp16_fmul_low.h (test_vfmlal_low_u32,
	test_vfmlalq_low_u32, test_vfmlsl_low_u32, test_vfmlslq_low_u32):
	Rename ...
	(test_vfmlal_low_f16, test_vfmlalq_low_f16, test_vfmlsl_low_f16,
	test_vfmlslq_low_f16): ... To This.
	* lib/target-supports.exp
	(check_effective_target_arm_fp16fml_neon_ok_nocache): Update test.

From-SVN: r269191

committed 6 years ago

9d04c986 Browse Directory

14 Feb, 2019 1 commit

pcs_attribute.c: New test. · cc259397

2018-02-14  Steve Ellcey  <sellcey@marvell.com>

	* gcc.target/aarch64/pcs_attribute.c: New test.

From-SVN: r268903

committed 6 years ago

cc259397 Browse Directory

13 Feb, 2019 1 commit

AArch64: Allow any offset for SVE addressing modes before reload. · 0c63a8ee

On AArch64 aarch64_classify_address has a case for when it's non-strict
that will allow it to accept any byte offset from a reg when validating
an address in a given addressing mode.

This because reload would later make the address valid. SVE however requires
the address always be valid, but currently allows any address when a MEM +
offset is used.  This causes an ICE as nothing later forces the address to be
legitimate.

The patch forces aarch64_emit_sve_pred_move via expand_insn to ensure that
the addressing mode is valid for any loads/stores it creates, which follows
the SVE way of handling address classifications.

gcc/ChangeLog:

	PR target/88847
	* config/aarch64/aarch64-sve.md (*pred_mov<mode>, pred_mov<mode>):
	Expose as @aarch64_pred_mov.
	* config/aarch64/aarch64.c (aarch64_classify_address):
	Use expand_insn which legitimizes operands.

gcc/testsuite/ChangeLog:

	PR target/88847
	* gcc.target/aarch64/sve/pr88847.c: New test.

From-SVN: r268845

committed 6 years ago

0c63a8ee Browse Directory

07 Feb, 2019 1 commit

[AArch64] Change representation of SABD in RTL · 8544ed6e

Richard raised a concern about the RTL we use to represent the AdvSIMD SABD
(vector signed absolute difference) instruction.
We currently represent it as ABS (MINUS op1 op2).

This isn't exactly what SABD does. ABS treats its input as a signed value
and returns the absolute of that.

For example:
(sabd:QI 64 -128) == 192 (unsigned) aka -64 (signed)
whereas
(minus:QI 64 -128) == 192 (unsigned) aka -64 (signed), (abs ...) of that is 64.

A better way to describe the instruction is with MINUS (SMAX (op1 op2) SMIN (op1 op2)).
This patch implements that, and also implements similar semantics for the UABD instruction
that uses UMAX and UMIN.

That way for the example above we'll have:
(minus:QI (smax:QI (64 -128)) (smin:QI (64 -128))) == (minus:QI 64 -128) == 192 (or -64 signed) which matches
what SABD does. 

	* config/aarch64/iterators.md (max_opp): New code_attr.
	(USMAX): New code iterator.
	* config/aarch64/predicates.md (aarch64_smin): New predicate.
	(aarch64_smax): Likewise.
	* config/aarch64/aarch64-simd.md (abd<mode>_3): Rename to...
	(*aarch64_<su>abd<mode>_3): ... Change RTL representation to
	MINUS (MAX MIN).

	* gcc.target/aarch64/abd_1.c: New test.
	* gcc.dg/sabd_1.c: Likewise.

From-SVN: r268658

committed 6 years ago

8544ed6e Browse Directory

25 Jan, 2019 1 commit

This is pretty unlikely in real code... · c590597c

This is pretty unlikely in real code, but similar to Arm, the AArch64
ABI has a bug with the handling of 128-bit bit-fields, where if the
bit-field dominates the overall alignment the back-end code may end up
passing the argument correctly.  This is a regression that started in
gcc-6 when the ABI support code was updated to support overaligned
types.  The fix is very similar in concept to the Arm fix.  128-bit
bit-fields are fortunately extremely rare, so I'd be very surprised if
anyone has been bitten by this.

PR target/88469
gcc/
	* config/aarch64/aarch64.c (aarch64_function_arg_alignment): Add new
	argument ABI_BREAK.  Set to true if the calculated alignment has
	changed in gcc-9.  Check bit-fields for their base type alignment.
	(aarch64_layout_arg): Warn if argument passing has changed in gcc-9.
	(aarch64_function_arg_boundary): Likewise.
	(aarch64_gimplify_va_arg_expr): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/aapcs64/test_align-10.c: New test.
	* gcc.target/aarch64/aapcs64/test_align-11.c: New test.
	* gcc.target/aarch64/aapcs64/test_align-12.c: New test.

From-SVN: r268273

committed 6 years ago

c590597c Browse Directory

17 Jan, 2019 1 commit

Rename stack-clash protection CFA register to avoid clash · 143d3b15

gcc/ChangeLog:

	PR target/88851
	* config/aarch64/aarch64.md (STACK_CLASH_SVE_CFA_REGNUM): New.
	* config/aarch64/aarch64.c (aarch64_allocate_and_probe_stack_space): Use
	it and document registers.

gcc/testsuite/ChangeLog:

	PR target/88851
	* gcc.target/aarch64/stack-check-cfa-3.c: Update test.

From-SVN: r268017

committed 6 years ago

143d3b15 Browse Directory

11 Jan, 2019 1 commit

Fix arm testism regression. · d58cb965

gcc/testsuite/ChangeLog:

2019-01-11  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Require neon
	and add options.

From-SVN: r267843

committed 6 years ago

d58cb965 Browse Directory

10 Jan, 2019 4 commits

re PR rtl-optimization/87305 (Segfault in end_hard_regno in… · 7e4d17a8

re PR rtl-optimization/87305 (Segfault in end_hard_regno in setup_live_pseudos_and_spill_after_risky_transforms on aarch64 big-endian)

2019-01-10  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/87305
	* lra-assigns.c
	(setup_live_pseudos_and_spill_after_risky_transforms): Check
	allocation for big endian pseudos used as paradoxical subregs and
	spill them if it is wrong.
	* lra-constraints.c (lra_constraints): Add a comment.

2019-01-10  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/87305
	* gcc.target/aarch64/pr87305.c: New.

From-SVN: r267823

committed 6 years ago

7e4d17a8 Browse Directory

[Committed, AArch64] Disable tests for ilp32. · 8b530f81

Currently Return Address Signing is only supported in lp64. Thus the
tests that I added recently (that enables return address signing by the
mbranch-protection=standard option), should also be exempted from testing in
ilp32. This patch adds the needed dg-require-effective-target directive in the
tests.

*** gcc/testsuite/ChangeLog ***

2019-01-10  Sudakshina Das  <sudi.das@arm.com>

	* gcc.target/aarch64/bti-1.c: Exempt for ilp32.
	* gcc.target/aarch64/bti-2.c: Likewise.
	* gcc.target/aarch64/bti-3.c: Likewise.

Committed as obvious.

From-SVN: r267818

committed 6 years ago

8b530f81 Browse Directory

arm-builtins.c (enum arm_type_qualifiers): Add qualifier_lane_pair_index. · c2b7062d

2019-01-10  Tamar Christina  <tamar.christina@arm.com>

	* config/arm/arm-builtins.c
	(enum arm_type_qualifiers): Add qualifier_lane_pair_index.
	(MAC_LANE_PAIR_QUALIFIERS): New.
	(arm_expand_builtin_args): Use it.
	(arm_expand_builtin_1): Likewise.
	* config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
	* config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
	* config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
	* config/arm/arm_neon.h:
	(vcadd_rot90_f16): New.
	(vcaddq_rot90_f16): New.
	(vcadd_rot270_f16): New.
	(vcaddq_rot270_f16): New.
	(vcmla_f16): New.
	(vcmlaq_f16): New.
	(vcmla_lane_f16): New.
	(vcmla_laneq_f16): New.
	(vcmlaq_lane_f16): New.
	(vcmlaq_laneq_f16): New.
	(vcmla_rot90_f16): New.
	(vcmlaq_rot90_f16): New.
	(vcmla_rot90_lane_f16): New.
	(vcmla_rot90_laneq_f16): New.
	(vcmlaq_rot90_lane_f16): New.
	(vcmlaq_rot90_laneq_f16): New.
	(vcmla_rot180_f16): New.
	(vcmlaq_rot180_f16): New.
	(vcmla_rot180_lane_f16): New.
	(vcmla_rot180_laneq_f16): New.
	(vcmlaq_rot180_lane_f16): New.
	(vcmlaq_rot180_laneq_f16): New.
	(vcmla_rot270_f16): New.
	(vcmlaq_rot270_f16): New.
	(vcmla_rot270_lane_f16): New.
	(vcmla_rot270_laneq_f16): New.
	(vcmlaq_rot270_lane_f16): New.
	(vcmlaq_rot270_laneq_f16): New.
	(vcadd_rot90_f32): New.
	(vcaddq_rot90_f32): New.
	(vcadd_rot270_f32): New.
	(vcaddq_rot270_f32): New.
	(vcmla_f32): New.
	(vcmlaq_f32): New.
	(vcmla_lane_f32): New.
	(vcmla_laneq_f32): New.
	(vcmlaq_lane_f32): New.
	(vcmlaq_laneq_f32): New.
	(vcmla_rot90_f32): New.
	(vcmlaq_rot90_f32): New.
	(vcmla_rot90_lane_f32): New.
	(vcmla_rot90_laneq_f32): New.
	(vcmlaq_rot90_lane_f32): New.
	(vcmlaq_rot90_laneq_f32): New.
	(vcmla_rot180_f32): New.
	(vcmlaq_rot180_f32): New.
	(vcmla_rot180_lane_f32): New.
	(vcmla_rot180_laneq_f32): New.
	(vcmlaq_rot180_lane_f32): New.
	(vcmlaq_rot180_laneq_f32): New.
	(vcmla_rot270_f32): New.
	(vcmlaq_rot270_f32): New.
	(vcmla_rot270_lane_f32): New.
	(vcmla_rot270_laneq_f32): New.
	(vcmlaq_rot270_lane_f32): New.
	(vcmlaq_rot270_laneq_f32): New.
	* config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
	vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
	vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
	vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
	* config/arm/neon.md (neon_vcmla_lane<rot><mode>,
	neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
	* config/arm/arm.c (arm_arch8_3, arm_arch8_4): New.
	* config/arm/arm.h (TARGET_COMPLEX, arm_arch8_3, arm_arch8_4): New.
	(arm_option_reconfigure_globals): Use them.
	* config/arm/iterators.md (VDF, VQ_HSF): New.
	(VCADD, VCMLA): New.
	(VF_constraint, rot, rotsplit1, rotsplit2): Add V4HF and V8HF.
	* config/arm/neon.md (neon_vcadd<rot><mode>, neon_vcmla<rot><mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCADD90, UNSPEC_VCADD270,
	UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180, UNSPEC_VCMLA270): New.

gcc/testsuite/ChangeLog:

2019-01-10  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.

From-SVN: r267796

committed 6 years ago

c2b7062d Browse Directory

aarch64-builtins.c (enum aarch64_type_qualifiers): Add qualifier_lane_pair_index. · 9d63f43b

gcc/ChangeLog:

2019-01-10  Tamar Christina  <tamar.christina@arm.com>

	* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): Add qualifier_lane_pair_index.
	(emit-rtl.h): Include.
	(TYPES_QUADOP_LANE_PAIR): New.
	(aarch64_simd_expand_args): Use it.
	(aarch64_simd_expand_builtin): Likewise.
	(AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_laneq_builtin_datum): New.
	(FCMLA_LANEQ_BUILTIN, AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
	AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_lane_builtin_data,
	aarch64_init_fcmla_laneq_builtins, aarch64_expand_fcmla_builtin): New.
	(aarch64_init_builtins): Add aarch64_init_fcmla_laneq_builtins.
	(aarch64_expand_buildin): Add AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V2SF,
	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V2SF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V2SF,
 	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ2700_V2SF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V4HF,
	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V4HF, AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V4HF,
	AARCH64_SIMD_BUILTIN_FCMLA_LANEQ270_V4HF.
	* config/aarch64/iterators.md (FCMLA_maybe_lane): New.
	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add __ARM_FEATURE_COMPLEX.
	* config/aarch64/aarch64-simd-builtins.def (fcadd90, fcadd270, fcmla0, fcmla90,
	fcmla180, fcmla270, fcmla_lane0, fcmla_lane90, fcmla_lane180, fcmla_lane270,
	fcmla_laneq0, fcmla_laneq90, fcmla_laneq180, fcmla_laneq270,
	fcmlaq_lane0, fcmlaq_lane90, fcmlaq_lane180, fcmlaq_lane270): New.
	* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane<rot><mode>,
	aarch64_fcmla_laneq<rot>v4hf, aarch64_fcmlaq_lane<rot><mode>,aarch64_fcadd<rot><mode>,
	aarch64_fcmla<rot><mode>): New.
	* config/aarch64/arm_neon.h:
	(vcadd_rot90_f16): New.
	(vcaddq_rot90_f16): New.
	(vcadd_rot270_f16): New.
	(vcaddq_rot270_f16): New.
	(vcmla_f16): New.
	(vcmlaq_f16): New.
	(vcmla_lane_f16): New.
	(vcmla_laneq_f16): New.
	(vcmlaq_lane_f16): New.
	(vcmlaq_rot90_lane_f16): New.
	(vcmla_rot90_laneq_f16): New.
	(vcmla_rot90_lane_f16): New.
	(vcmlaq_rot90_f16): New.
	(vcmla_rot90_f16): New.
	(vcmlaq_laneq_f16): New.
	(vcmla_rot180_laneq_f16): New.
	(vcmla_rot180_lane_f16): New.
	(vcmlaq_rot180_f16): New.
	(vcmla_rot180_f16): New.
	(vcmlaq_rot90_laneq_f16): New.
	(vcmlaq_rot270_laneq_f16): New.
	(vcmlaq_rot270_lane_f16): New.
	(vcmla_rot270_laneq_f16): New.
	(vcmlaq_rot270_f16): New.
	(vcmla_rot270_f16): New.
	(vcmlaq_rot180_laneq_f16): New.
	(vcmlaq_rot180_lane_f16): New.
	(vcmla_rot270_lane_f16): New.
	(vcadd_rot90_f32): New.
	(vcaddq_rot90_f32): New.
	(vcaddq_rot90_f64): New.
	(vcadd_rot270_f32): New.
	(vcaddq_rot270_f32): New.
	(vcaddq_rot270_f64): New.
	(vcmla_f32): New.
	(vcmlaq_f32): New.
	(vcmlaq_f64): New.
	(vcmla_lane_f32): New.
	(vcmla_laneq_f32): New.
	(vcmlaq_lane_f32): New.
	(vcmlaq_laneq_f32): New.
	(vcmla_rot90_f32): New.
	(vcmlaq_rot90_f32): New.
	(vcmlaq_rot90_f64): New.
	(vcmla_rot90_lane_f32): New.
	(vcmla_rot90_laneq_f32): New.
	(vcmlaq_rot90_lane_f32): New.
	(vcmlaq_rot90_laneq_f32): New.
	(vcmla_rot180_f32): New.
	(vcmlaq_rot180_f32): New.
	(vcmlaq_rot180_f64): New.
	(vcmla_rot180_lane_f32): New.
	(vcmla_rot180_laneq_f32): New.
	(vcmlaq_rot180_lane_f32): New.
	(vcmlaq_rot180_laneq_f32): New.
	(vcmla_rot270_f32): New.
	(vcmlaq_rot270_f32): New.
	(vcmlaq_rot270_f64): New.
	(vcmla_rot270_lane_f32): New.
	(vcmla_rot270_laneq_f32): New.
	(vcmlaq_rot270_lane_f32): New.
	(vcmlaq_rot270_laneq_f32): New.
	* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
	* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
	UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
	(FCADD, FCMLA): New.
	(rot): New.
	* config/arm/types.md (neon_fcadd, neon_fcmla): New.

gcc/testsuite/ChangeLog:

2019-01-10  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: New test.
	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: New test.

From-SVN: r267795

committed 6 years ago

9d63f43b Browse Directory

09 Jan, 2019 4 commits

[AArch64, 6/6] Enable BTI: Add configure option. · c7ff4f0f

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.

This patch is adding a new configure option for enabling BTI and
Return Address Signing by default.

*** gcc/ChangeLog ***

2018-01-09  Sudakshina Das  <sudi.das@arm.com>

	* config/aarch64/aarch64.c (aarch64_override_options): Add case to
	check configure option to set BTI and Return Address Signing.
	* configure.ac: Add --enable-standard-branch-protection and
	--disable-standard-branch-protection.
	* configure: Regenerated.
	* doc/install.texi: Document the same.

*** gcc/testsuite/ChangeLog ***

2018-01-09  Sudakshina Das  <sudi.das@arm.com>

	* gcc.target/aarch64/bti-1.c: Update test to not add command line
	option when configure with bti.
	* gcc.target/aarch64/bti-2.c: Likewise.
	* lib/target-supports.exp
	(check_effective_target_default_branch_protection):
	Add configure check for --enable-standard-branch-protection.

From-SVN: r267770

committed 6 years ago

c7ff4f0f Browse Directory

[AArch64, 5/6] Enable BTI : Add new pass for BTI. · b5f794b4

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.

This patch adds a new pass called "bti" which is triggered by the command
line argument -mbranch-protection whenever "bti" is turned on.

The pass iterates through the instructions and adds appropriated BTI
instructions based on the following:
  * Add a new "BTI C" at the beginning of a function, unless its already
    protected by a "PACIASP". We exempt the functions that are only called
    directly.
  * Add a new "BTI J" for every target of an indirect jump, jump table
    targets, non-local goto targets or labels that might be referenced by
    variables, constant pools, etc (NOTE_INSN_DELETED_LABEL).

Since we have already changed the use of indirect tail calls to only x16 and
x17, we do not have to use "BTI JC".
(check patch 3/6).

*** gcc/ChangeLog ***

2018-01-09  Sudakshina Das  <sudi.das@arm.com>
	    Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

	* config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
	* gcc/config/aarch64/aarch64.h: Update comment for TRAMPOLINE_SIZE.
	* config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Update
	if bti is enabled.
	* config/aarch64/aarch64-bti-insert.c: New file.
	* config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert bti
	pass.
	* config/aarch64/aarch64-protos.h (make_pass_insert_bti): Declare the
	new bti pass.
	* config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG,
	UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC.
	(bti_noarg, bti_j, bti_c, bti_jc): New define_insns.
	* config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.

*** gcc/testsuite/ChangeLog ***

2018-01-09  Sudakshina Das  <sudi.das@arm.com>

	* gcc.target/aarch64/bti-1.c: New test.
	* gcc.target/aarch64/bti-2.c: New test.
	* gcc.target/aarch64/bti-3.c: New test.
	* lib/target-supports.exp
	(check_effective_target_aarch64_bti_hw): Add new check for BTI hw.

Co-Authored-By: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>

From-SVN: r267769

committed 6 years ago

b5f794b4 Browse Directory

[AArch64, 3/6] Restrict indirect tail calls to x16 and x17 · 901e66e0

This patch is part of a series that enables ARMv8.5-A in GCC and
adds Branch Target Identification Mechanism.

This patch changes the registers that are allowed for indirect tail calls.
We are choosing to restrict these to only x16 or x17.

Indirect tail calls are special in a way that they convert a call statement
(BLR instruction) to a jump statement (BR instruction). For the best possible
use of Branch Target Identification Mechanism, we would like to place a
"BTI C" (call) at the beginning of the function which is only
compatible with BLRs and BR X16/X17. In order to make indirect tail calls
compatible with this scenario, we are restricting the TAILCALL_ADDR_REGS.

In order to use x16/x17 for this purpose, we also had to change the use
of these registers in the epilogue/prologue handling. For this purpose
we are now using x12 and x13 named as EP0_REGNUM and EP1_REGNUM as
scratch registers for epilogue and prologue.

*** gcc/ChangeLog***

2018-01-09  Sudakshina Das  <sudi.das@arm.com>

	* config/aarch64/aarch64.c (aarch64_expand_prologue): Use new
	epilogue/prologue scratch registers EP0_REGNUM and EP1_REGNUM.
	(aarch64_expand_epilogue): Likewise.
	(aarch64_output_mi_thunk): Likewise
	* config/aarch64/aarch64.h (REG_CLASS_CONTENTS): Change
	TAILCALL_ADDR_REGS to x16 and x17.
	* config/aarch64/aarch64.md: Define EP0_REGNUM and EP1_REGNUM.

*** gcc/testsuite/ChangeLog ***

2018-01-09  Sudakshina Das  <sudi.das@arm.com>

	* gcc.target/aarch64/test_frame_17.c: Update to check for EP0_REGNUM
	instead of IP0_REGNUM and add test case.

From-SVN: r267767

committed 6 years ago

901e66e0 Browse Directory

[Aarch64][SVE] Add copysign and xorsign support · 6c9c7b73

This patch adds support for copysign and xorsign builtins to SVE. With the new
expands, they can be vectorized using bitwise logical operations.

I tested this patch in an aarch64 machine bootstrapping the compiler and
running the checks.

2019-01-09  Alejandro Martinez  <alejandro.martinezvicente@arm.com>

	* config/aarch64/aarch64-sve.md (copysign<mode>3): New define_expand.
	(xorsign<mode>3): Likewise.

2019-01-09  Alejandro Martinez  <alejandro.martinezvicente@arm.com>

	* gcc.target/aarch64/sve/copysign_1.c: New test for SVE vectorized
	copysign.
	* gcc.target/aarch64/sve/copysign_1_run.c: Likewise.
	* gcc.target/aarch64/sve/xorsign_1.c: New test for SVE vectorized
	xorsign.
	* gcc.target/aarch64/sve/xorsign_1_run.c: Likewise.

From-SVN: r267764

committed 6 years ago

6c9c7b73 Browse Directory

08 Jan, 2019 1 commit

[PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI · efac62a3

gcc/ChangeLog:

2019-01-08  Sam Tebbs  <sam.tebbs@arm.com>

	* config/aarch64/aarch64.c (BRANCH_PROTECT_STR_MAX,
	aarch64_parse_branch_protection,
	struct aarch64_branch_protect_type,
	aarch64_handle_no_branch_protection,
	aarch64_handle_standard_branch_protection,
	aarch64_validate_mbranch_protection,
	aarch64_handle_pac_ret_protection,
	aarch64_handle_attr_branch_protection,
	accepted_branch_protection_string,
	aarch64_pac_ret_subtypes,
	aarch64_branch_protect_types,
	aarch64_handle_pac_ret_leaf): Define.
	(aarch64_override_options_after_change_1, aarch64_override_options):
	Add check for accepted_branch_protection_string.
	(aarch64_option_save): Save accepted_branch_protection_string.
	(aarch64_option_restore): Save accepted_branch_protection_string.
	* config/aarch64/aarch64.c (aarch64_attributes): Add branch-protection.
	* config/aarch64/aarch64.opt: Add mbranch-protection. Deprecate
	msign-return-address.
	* doc/invoke.texi: Add mbranch-protection.

gcc/testsuite/Changelog:

2019-01-08  Sam Tebbs  <sam.tebbs@arm.com>

	* gcc.target/aarch64/(return_address_sign_1.c,
	return_address_sign_2.c, return_address_sign_3.c (__attribute__)):
	Change option to -mbranch-protection.
	* gcc.target/aarch64/(branch-protection-option.c,
	branch-protection-option-2.c, branch-protection-attr.c,
	branch-protection-attr-2.c): New file.

From-SVN: r267717

committed 6 years ago

efac62a3 Browse Directory

07 Jan, 2019 1 commit

Investigating PR target/86891 revealed a number of issues with the way the... · a58fe3c5

Investigating PR target/86891 revealed a number of issues with the way
the AArch64 backend was handing overflow detection patterns.  Firstly,
expansion for signed and unsigned types is not the same as in one form
the overflow is detected via the C flag and in the other it is done
via the V flag in the PSR.  Secondly, particular care has to be taken
when describing overflow of signed types: the comparison has to be
performed conceptually on a value that cannot overflow and compared to
a value that might have overflowed.

It became apparent that some of the patterns were simply unmatchable
(they collapse to NEG in the RTL rather than subtracting from zero)
and a number of patterns were overly restrictive in terms of the
immediate constants that they supported.  I've tried to address all of
these issues as well.

gcc:

	PR target/86891
	* config/aarch64/aarch64.c (aarch64_expand_subvti): New parameter
	unsigned_p.  Handle signed and unsigned overflow correction as
	required.
	* config/aarch64/aarch64-protos.h (aarch64_expand_subvti): Update
	prototype.
	* config/aarch64/aarch64.md (addv<mode>4): Use aarch64_plus_operand
	for operand 2.
	(add<mode>3_compareV_imm): Make this callable for expanding.
	(subv<GPI:mode>4): Use register_operand for operand 1.  Use
	aarch64_plus_operand for operand 2.
	(subv<GPI:mode>_insn): New insn pattern.
	(subv<GPI:mode>_imm): Likewise.
	(negv<GPI:mode>3): New expand pattern.
	(negv<GPI:mode>_insn): New insn pattern.
	(negv<GPI:mode>_cmp_only): Likewise.
	(cmpv<GPI:mode>_insn): Likewise.
	(subvti4): Use register_operand for operand 1.  Update call to
	aarch64_expand_subvti.
	(usubvti4): Likewise.
	(negvti3): New expand pattern.
	(negdi_carryout): New insn pattern.
	(negvdi_carryinV): New insn pattern.
	(sub<mode3>_compare1_imm): Delete named insn pattern, make anonymous
	version the named version.
	(peepholes to convert to sub<mode3>_compare1_imm): Adjust order of
	operands.
	(usub<GPI:mode>3_carryinC, usub<GPI:mode>3_carryinC_z1): New insn
	patterns.
	(usub<GPI:mode>3_carryinC_z2, usub<GPI:mode>3_carryinC): New insn
	patterns.
	(sub<mode>3_carryinCV, sub<mode>3_carryinCV_z1_z2): Delete.
	(sub<mode>3_carryinCV_z1, sub<mode>3_carryinCV_z2): Delete.
	(sub<mode>3_carryinCV): Delete.
	(sub<GPI:mode>3_carryinV): New expand pattern.
	sub<mode>3_carryinV, sub<mode>3_carryinV_z2): New insn patterns.

testsuite:

	* gcc.target/aarch64/subs_compare_2.c: Make '#' immediate prefix
	optional in scan pattern.

From-SVN: r267650

committed 6 years ago

a58fe3c5 Browse Directory

04 Jan, 2019 1 commit

[PATCH][GCC][Aarch64] Change expected bfxil count in… · af9b2f86

[PATCH][GCC][Aarch64] Change expected bfxil count in gcc.target/aarch64/combine_bfxil.c to 18 (PR/87763)

gcc/testsuite/Changelog:

2019-01-04  Sam Tebbs  <sam.tebbs@arm.com>

	PR gcc/87763
	* gcc.target/aarch64/combine_bfxil.c: Change scan-assembler-times bfxil
	count to 18.

From-SVN: r267579

committed 6 years ago

af9b2f86 Browse Directory

01 Jan, 2019 1 commit
- Update copyright years. · a5544970
```
From-SVN: r267494
```
  Jakub Jelinek committed 6 years ago
  a5544970 Browse Directory
20 Dec, 2018 2 commits

[AArch64][SVE] Add ABS support · 69c5fdcf

For some reason we missed ABS out of the list of supported integer
operations when adding the SVE port initially.

2018-12-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/iterators.md (SVE_INT_UNARY, fp_int_op): Add abs.
	(SVE_FP_UNARY): Sort.

gcc/testsuite/
	* gcc.target/aarch64/pr64946.c: Force nosve.
	* gcc.target/aarch64/ssadv16qi.c: Likewise.
	* gcc.target/aarch64/usadv16qi.c: Likewise.
	* gcc.target/aarch64/vect-abs-compile.c: Likewise.
	* gcc.target/aarch64/sve/abs_1.c: New test.

From-SVN: r267304

committed 6 years ago

69c5fdcf Browse Directory

[AArch64][SVE] Fix IFN_COND_FMLA movprfx alternative · 7abc36cc

This patch fixes a cut-&-pasto in the (match_dup 4) version of
"cond_<SVE_COND_FP_TERNARY:optab><SVE_F:mode>".  (It's a shame
that there's so much cut-&-paste in these patterns, but it's hard
to avoid without more infrastructure.)

2018-12-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_4): Use
	sve_fmla_op rather than sve_fmad_op for the movprfx alternative.

gcc/testsuite/
	* gcc.target/aarch64/sve/fmla_2.c: New test.
	* gcc.target/aarch64/sve/fmla_2_run.c: Likewise

From-SVN: r267303

committed 6 years ago

7abc36cc Browse Directory

17 Dec, 2018 1 commit

aarch64-torture.exp: New file. · ba1a78ff

2018-12-17  Steve Ellcey  <sellcey@cavium.com>

	* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
	* gcc.target/aarch64/torture/simd-abi-1.c: New test.
	* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
	* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
	* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.
	* gcc.target/aarch64/torture/simd-abi-5.c: Ditto.
	* gcc.target/aarch64/torture/simd-abi-6.c: Ditto.
	* gcc.target/aarch64/torture/simd-abi-7.c: Ditto.

From-SVN: r267209

committed 6 years ago

ba1a78ff Browse Directory

07 Dec, 2018 3 commits

[AArch64][2/2] Add sve_width -moverride tunable · 886f092f

On top of the previous patch that implements TARGET_ESTIMATED_POLY_VALUE
and adds an sve_width tuning field to the CPU structs, this patch implements
an -moverride knob to adjust this sve_width field to allow for experimentation.
Again, reminder that this only has an effect when compiling for VLA-SVE that is,
without msve-vector-bits=<foo>. This just adjusts tuning heuristics in the compiler,,
like profitability thresholds for vectorised versioned loops, and others.

It can be used, for example like -moverride=sve_width=256 to set the sve_width
tuning field to 256. Widths outside of the accepted SVE widths [128 - 2048] are rejected
as you'd expect.

    * config/aarch64/aarch64.c (aarch64_tuning_override_functions): Add
    sve_width entry.
    (aarch64_parse_sve_width_string): Define.


    * gcc.target/aarch64/sve/override_sve_width_1.c: New test.

From-SVN: r266898

committed 6 years ago

886f092f Browse Directory

[AArch64][SVE] Remove unnecessary PTRUEs from integer arithmetic · 26004f51

When using the unpredicated immediate forms of MUL, LSL, LSR and ASR,
the rtl patterns would still have the predicate operand we created for
the other forms.  This patch splits the patterns after reload in order
to get rid of the predicate, like we already do for WHILE.

2018-12-07  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/aarch64-sve.md (*mul<mode>3, *v<optab><mode>3):
	Split the patterns after reload if we don't need the predicate
	operand.
	(*post_ra_mul<mode>3, *post_ra_v<optab><mode>3): New patterns.

gcc/testsuite/
	* gcc.target/aarch64/sve/pred_elim_2.c: New test.

From-SVN: r266892

committed 6 years ago

26004f51 Browse Directory

[AArch64][SVE] Remove unnecessary PTRUEs from FP arithmetic · 740c1ed7

When using the unpredicated all-register forms of FADD, FSUB and FMUL,
the rtl patterns would still have the predicate operand we created for
the other forms.  This patch splits the patterns after reload in order
to get rid of the predicate, like we already do for WHILE.

2018-12-07  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/iterators.md (SVE_UNPRED_FP_BINARY): New code
	iterator.
	(sve_fp_op): Handle minus and mult.
	* config/aarch64/aarch64-sve.md (*add<mode>3, *sub<mode>3)
	(*mul<mode>3): Split the patterns after reload if we don't
	need the predicate operand.
	(*post_ra_<sve_fp_op><mode>3): New pattern.

gcc/testsuite/
	* gcc.target/aarch64/sve/pred_elim_1.c: New test.

From-SVN: r266891

committed 6 years ago

740c1ed7 Browse Directory

06 Dec, 2018 1 commit

re PR target/87598 (Rejects "%a0" with constant) · 31460ed2

	PR target/87598
	* config/aarch64/aarch64.c (aarch64_print_address_internal): Don't
	call output_operand_lossage on VOIDmode CONST_INTs.  After
	output_operand_lossage do return false.

	* gcc.target/aarch64/asm-5.c: New test.

From-SVN: r266852

committed 6 years ago

31460ed2 Browse Directory

29 Nov, 2018 1 commit

PR c/88172 - attribute aligned of zero silently accepted but ignored · 673670da

PR c/88172 - attribute aligned of zero silently accepted but ignored
PR testsuite/88208 - new test case c-c++-common/builtin-has-attribute-3.c in r266335 has multiple excess errors

gcc/ChangeLog:

	PR c/88172
	PR testsuite/88208
	* doc/extend.texi (attribute constructor): Clarify.

gcc/c/ChangeLog:

	PR c/88172
	PR testsuite/88208
	* c-decl.c (declspec_add_alignas): Adjust call to check_user_alignment.

gcc/c-family/ChangeLog:

	PR c/88172
	PR testsuite/88208
	* c-attribs.c (common_handle_aligned_attribute): Silently avoid setting
	alignments to values less than the target requires.
	(has_attribute): For attribute aligned consider both the attribute
	and the alignment bits.
	* c-common.c (c_init_attributes): Optionally issue a warning for
	zero alignment.

gcc/testsuite/ChangeLog:

	PR c/88172
	PR testsuite/88208
	* gcc.dg/attr-aligned-2.c: New test.
	* gcc.dg/builtin-has-attribute.c: Adjust.
	* c-c++-common/builtin-has-attribute-2.c: Same.
	* c-c++-common/builtin-has-attribute-3.c: Same.
	* c-c++-common/builtin-has-attribute-4.c: Same.
	* c-c++-common/builtin-has-attribute-5.c: New test.
	* gcc.target/aarch64/attr-aligned.c: Same.
	* gcc.target/i386/attr-aligned.c: Same.
	* gcc.target/powerpc/attr-aligned.c: Same.
	* gcc.target/sparc/attr-aligned.c: Same.

From-SVN: r266633

committed 6 years ago

673670da Browse Directory

21 Nov, 2018 1 commit

re PR target/87839 (ICE in final_scan_insn_1, at final.c:3070) · e21679a8

	PR target/87839
	* config/aarch64/atomics.md (@aarch64_compare_and_swap<mode>): Use
	rIJ constraint for aarch64_plus_operand rather than rn.

	* gcc.target/aarch64/pr87839.c: New test.

From-SVN: r266346

committed 6 years ago

e21679a8 Browse Directory

19 Nov, 2018 1 commit

Disable unrolling for loops vectorised with non-constant VF · 1fd31975

This is an alternative to https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00694.html
As richi suggested, this disables unrolling of loops vectorised with variable-length SVE
in the vectoriser itself through the loop->unroll member.

It took me a few tries to get it right, as it needs to be set to '1' to disable unrolling,
the rationale for that mechanism is described in the comment in cfgloop.h.

	* tree-vect-loop.c (vect_transform_loop): Disable further unrolling
	of the loop if vf is non-constant.

	* gcc.target/aarch64/sve/unroll-1.c: New test.

From-SVN: r266281

committed 6 years ago

1fd31975 Browse Directory

15 Nov, 2018 1 commit

Fix spaces in PR62178 test · e99d38d0

Fix spaces in scan assembler tests.

    testsuite/
	* gcc.target/aarch64/pr62178.c: Fix spaces.

From-SVN: r266179

committed 6 years ago

e99d38d0 Browse Directory

14 Nov, 2018 1 commit

[AArch64] Fix PR62178 testcase failures · ff4d8480

The testcase for PR62178 has been failing for a while due to the pass
conditions being too tight, resulting in failures with -mcmodel=tiny:

	ldr	q2, [x0], 124
	ld1r	{v1.4s}, [x1], 4
	cmp	x0, x2
	mla	v0.4s, v2.4s, v1.4s
	bne	.L7

-mcmodel=small generates the slightly different:

	ldr	q1, [x0], 124
	ldr	s2, [x1, 4]!
	cmp	x0, x2
	mla	v0.4s, v1.4s, v2.s[0]
	bne	.L7

This is due to Combine merging a DUP instruction with either a load
or MLA - we can't force it to prefer one over the other.  However the
generated vector loop is fast either way since it generates MLA and
merges the DUP either with a load or MLA.  So relax the conditions
slightly and check we still generate MLA and there is no DUP or FMOV.

The testcase now passes - committed as obvious.

    testsuite/
	* gcc.target/aarch64/pr62178.c: Relax scan-assembler checks.

From-SVN: r266139

committed 6 years ago

ff4d8480 Browse Directory

12 Nov, 2018 2 commits

re PR target/86677 (popcount builtin detection is breaking some kernel build) · 06a6b46a

gcc/ChangeLog:

2018-11-13  Kugan Vivekanandarajah  <kuganv@linaro.org>

	PR middle-end/86677
	PR middle-end/87528
	* tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
	as expensive when backend does not define it.

gcc/testsuite/ChangeLog:

2018-11-13  Kugan Vivekanandarajah  <kuganv@linaro.org>

	PR middle-end/86677
	PR middle-end/87528
	* g++.dg/tree-ssa/pr86544.C: Run only for target supporting popcount
	pattern.
	* gcc.dg/tree-ssa/popcount.c: Likewise.
	* gcc.dg/tree-ssa/popcount2.c: Likewise.
	* gcc.dg/tree-ssa/popcount3.c: Likewise.
	* gcc.target/aarch64/popcount4.c: New test.
	* lib/target-supports.exp (check_effective_target_popcountl): New.

From-SVN: r266039

committed 6 years ago

06a6b46a Browse Directory

[PR87815]Don't generate shift sequence for load replacement in DSE when the mode… · e6575643

[PR87815]Don't generate shift sequence for load replacement in DSE when the mode size is not compile-time constant

The patch adds a check if the gap is compile-time constant.

This happens when dse decides to replace the load with previous store value.
The problem is that, shift sequence could not accept compile-time non-constant
mode operand.

gcc/

2018-11-12  Renlin Li  <renlin.li@arm.com>

	PR target/87815
	* dse.c (get_stored_val): Add check for compile-time
	constantness of gap.

gcc/testsuite/

2018-11-12  Renlin Li  <renlin.li@arm.com>

	PR target/87815
	* gcc.target/aarch64/sve/pr87815.c: New.

From-SVN: r266033

committed 6 years ago

e6575643 Browse Directory

31 Oct, 2018 1 commit

Provide extension hint for aarch64 target (PR driver/83193). · c7887347

2018-10-31  Martin Liska  <mliska@suse.cz>

	PR driver/83193
	* common/config/aarch64/aarch64-common.c (aarch64_parse_extension):
	Add new argument invalid_extension.
	(aarch64_get_all_extension_candidates): New function.
	(aarch64_rewrite_selected_cpu): Add NULL to function call.
	* config/aarch64/aarch64-protos.h (aarch64_parse_extension): Add
	new argument.
	(aarch64_get_all_extension_candidates): New function.
	* config/aarch64/aarch64.c (aarch64_parse_arch): Add new
	argument invalid_extension.
	(aarch64_parse_cpu): Likewise.
	(aarch64_print_hint_for_extensions): New function.
	(aarch64_validate_mcpu): Provide hint about invalid extension.
	(aarch64_validate_march): Likewise.
	(aarch64_handle_attr_arch): Pass new argument.
	(aarch64_handle_attr_cpu): Provide hint about invalid extension.
	(aarch64_handle_attr_isa_flags): Likewise.
2018-10-31  Martin Liska  <mliska@suse.cz>

	PR driver/83193
	* gcc.target/aarch64/spellcheck_7.c: New test.
	* gcc.target/aarch64/spellcheck_8.c: New test.
	* gcc.target/aarch64/spellcheck_9.c: New test.

From-SVN: r265686

committed 6 years ago

c7887347 Browse Directory

15 Oct, 2018 1 commit

[PR87563][AARCH64-SVE]: Don't keep ifcvt loop when COND_<OP> ifn could not be vectorized. · 41241199

ifcvt will created versioned loop and it will permissively generate
scalar COND_<OP> ifn.

If in the loop vectorize pass, COND_<OP> could not get vectoized,
the if-converted loop should be abandoned when the target doesn't support
such ifn.


gcc/

2018-10-12  Renlin Li  <renlin.li@arm.com>

	PR target/87563
	* tree-vectorizer.c (try_vectorize_loop_1): Don't use
	if-conversioned loop when it contains ifn with types not
	supported by backend.
	* internal-fn.c (expand_direct_optab_fn): Add an assert.
	(direct_internal_fn_supported_p): New helper function.
	* internal-fn.h (direct_internal_fn_supported_p): Declare.

gcc/testsuite/

2018-10-12  Renlin Li  <renlin.li@arm.com>

	PR target/87563
	* gcc.target/aarch64/sve/pr87563.c: New.

From-SVN: r265172

committed 6 years ago

41241199 Browse Directory

12 Oct, 2018 1 commit

[AArch64] Support zero-extended move to FP register · 0cfc095c

The popcount expansion uses SIMD instructions acting on 64-bit values.
As a result a popcount of a 32-bit integer requires zero-extension before 
moving the zero-extended value into an FP register.  This patch adds
support for zero-extended int->FP moves to avoid the redundant uxtw.
Similarly, add support for 32-bit zero-extending load->FP register
and 32-bit zero-extending FP->FP and FP->int moves.
Add a missing 'fp' arch attribute to the related 8/16-bit pattern and
fix an incorrect type attribute.

To complete zero-extended load support, add a new alternative to 
load_pair_zero_extendsidi2_aarch64 to support LDP into FP registers too.

int f (int a)
{
  return __builtin_popcount (a);
}

Before:
	uxtw	x0, w0
	fmov	d0, x0
	cnt	v0.8b, v0.8b
	addv	b0, v0.8b
	fmov	w0, s0
	ret

After:
	fmov	s0, w0
	cnt	v0.8b, v0.8b
	addv	b0, v0.8b
	fmov	w0, s0
	ret

Passes regress & bootstrap on AArch64.

    gcc/
	* config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Add alternatives
	to zero-extend between int and floating-point registers.
	(load_pair_zero_extendsidi2_aarch64): Add alternative for zero-extended
	ldp into floating-point registers.  Add type and arch attributes.
	(zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add arch attribute.
	Use f_loads for type attribute.

    testsuite/
	* gcc.target/aarch64/popcnt.c: Test zero-extended popcount.
	* gcc.target/aarch64/vec_zeroextend.c: Test zero-extended vectors.

From-SVN: r265079

committed 6 years ago

0cfc095c Browse Directory

11 Oct, 2018 1 commit

[AArch64] Fix PR87511 · 1b6acf23

As mentioned in PR87511, the shift used in aarch64_mask_and_shift_for_ubfiz_p
should be evaluated as a HOST_WIDE_INT rather than int.

Passes bootstrap & regress.

    gcc/
	PR target/87511
	* config/aarch64/aarch64.c (aarch64_mask_and_shift_for_ubfiz_p):
	Use HOST_WIDE_INT_1U for shift.

    testsuite/
	PR target/87511
	* gcc.target/aarch64/pr87511.c: Add new test.

From-SVN: r265058

committed 6 years ago

1b6acf23 Browse Directory