Commit cc68f7c2 by Richard Sandiford Committed by Richard Sandiford

[AArch64] Add autovec support for partial SVE vectors

This patch adds the bare minimum needed to support autovectorisation of
partial SVE vectors, namely moves and integer addition.  Later patches
add more interesting cases.

2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/aarch64-modes.def: Define partial SVE vector
	float modes.
	* config/aarch64/aarch64-protos.h (aarch64_sve_pred_mode): New
	function.
	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle the
	new vector float modes.
	(aarch64_sve_container_bits): New function.
	(aarch64_sve_pred_mode): Likewise.
	(aarch64_get_mask_mode): Use it.
	(aarch64_sve_element_int_mode): Handle structure modes and partial
	modes.
	(aarch64_sve_container_int_mode): New function.
	(aarch64_vectorize_related_mode): Return SVE modes when given
	SVE modes.  Handle partial modes, taking the preferred number
	of units from the size of the given mode.
	(aarch64_hard_regno_mode_ok): Allow partial modes to be stored
	in registers.
	(aarch64_expand_sve_ld1rq): Use the mode form of aarch64_sve_pred_mode.
	(aarch64_expand_sve_const_vector): Handle partial SVE vectors.
	(aarch64_split_sve_subreg_move): Use the mode form of
	aarch64_sve_pred_mode.
	(aarch64_secondary_reload): Handle partial modes in the same way
	as full big-endian vectors.
	(aarch64_vector_mode_supported_p): Allow partial SVE vectors.
	(aarch64_autovectorize_vector_modes): Try unpacked SVE vectors,
	merging with the Advanced SIMD modes.  If two modes have the
	same size, try the Advanced SIMD mode first.
	(aarch64_simd_valid_immediate): Use the container rather than
	the element mode for INDEX constants.
	(aarch64_simd_vector_alignment): Make the alignment of partial
	SVE vector modes the same as their minimum size.
	(aarch64_evpc_sel): Use the mode form of aarch64_sve_pred_mode.
	* config/aarch64/aarch64-sve.md (mov<SVE_FULL:mode>): Extend to...
	(mov<SVE_ALL:mode>): ...this.
	(movmisalign<SVE_FULL:mode>): Extend to...
	(movmisalign<SVE_ALL:mode>): ...this.
	(*aarch64_sve_mov<mode>_le): Rename to...
	(*aarch64_sve_mov<mode>_ldr_str): ...this.
	(*aarch64_sve_mov<SVE_FULL:mode>_be): Rename and extend to...
	(*aarch64_sve_mov<SVE_ALL:mode>_no_ldr_str): ...this.  Handle
	partial modes regardless of endianness.
	(aarch64_sve_reload_be): Rename to...
	(aarch64_sve_reload_mem): ...this and enable for little-endian.
	Use aarch64_sve_pred_mode to get the appropriate predicate mode.
	(@aarch64_pred_mov<SVE_FULL:mode>): Extend to...
	(@aarch64_pred_mov<SVE_ALL:mode>): ...this.
	(*aarch64_sve_mov<SVE_FULL:mode>_subreg_be): Extend to...
	(*aarch64_sve_mov<SVE_ALL:mode>_subreg_be): ...this.
	(@aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
	(@aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
	(*aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
	(*aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
	(maskload<SVE_FULL:mode><vpred>): Extend to...
	(maskload<SVE_ALL:mode><vpred>): ...this.
	(maskstore<SVE_FULL:mode><vpred>): Extend to...
	(maskstore<SVE_ALL:mode><vpred>): ...this.
	(vec_duplicate<SVE_FULL:mode>): Extend to...
	(vec_duplicate<SVE_ALL:mode>): ...this.
	(*vec_duplicate<SVE_FULL:mode>_reg): Extend to...
	(*vec_duplicate<SVE_ALL:mode>_reg): ...this.
	(sve_ld1r<SVE_FULL:mode>): Extend to...
	(sve_ld1r<SVE_ALL:mode>): ...this.
	(vec_series<SVE_FULL_I:mode>): Extend to...
	(vec_series<SVE_I:mode>): ...this.
	(*vec_series<SVE_FULL_I:mode>_plus): Extend to...
	(*vec_series<SVE_I:mode>_plus): ...this.
	(@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Avoid
	new VPRED ambiguity.
	(@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise.
	(add<SVE_FULL_I:mode>3): Extend to...
	(add<SVE_I:mode>3): ...this.
	* config/aarch64/iterators.md (SVE_ALL, SVE_I): New mode iterators.
	(Vetype, Vesize, VEL, Vel, vwcore): Handle partial SVE vector modes.
	(VPRED, vpred): Likewise.
	(Vctype): New iterator.
	(vw): Remove SVE modes.

gcc/testsuite/
	* gcc.target/aarch64/sve/mixed_size_1.c: New test.
	* gcc.target/aarch64/sve/mixed_size_2.c: Likewise.
	* gcc.target/aarch64/sve/mixed_size_3.c: Likewise.
	* gcc.target/aarch64/sve/mixed_size_4.c: Likewise.
	* gcc.target/aarch64/sve/mixed_size_5.c: Likewise.

From-SVN: r278341
parent 7f333599
2019-11-16 Richard Sandiford <richard.sandiford@arm.com> 2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* config/aarch64/aarch64-modes.def: Define partial SVE vector
float modes.
* config/aarch64/aarch64-protos.h (aarch64_sve_pred_mode): New
function.
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle the
new vector float modes.
(aarch64_sve_container_bits): New function.
(aarch64_sve_pred_mode): Likewise.
(aarch64_get_mask_mode): Use it.
(aarch64_sve_element_int_mode): Handle structure modes and partial
modes.
(aarch64_sve_container_int_mode): New function.
(aarch64_vectorize_related_mode): Return SVE modes when given
SVE modes. Handle partial modes, taking the preferred number
of units from the size of the given mode.
(aarch64_hard_regno_mode_ok): Allow partial modes to be stored
in registers.
(aarch64_expand_sve_ld1rq): Use the mode form of aarch64_sve_pred_mode.
(aarch64_expand_sve_const_vector): Handle partial SVE vectors.
(aarch64_split_sve_subreg_move): Use the mode form of
aarch64_sve_pred_mode.
(aarch64_secondary_reload): Handle partial modes in the same way
as full big-endian vectors.
(aarch64_vector_mode_supported_p): Allow partial SVE vectors.
(aarch64_autovectorize_vector_modes): Try unpacked SVE vectors,
merging with the Advanced SIMD modes. If two modes have the
same size, try the Advanced SIMD mode first.
(aarch64_simd_valid_immediate): Use the container rather than
the element mode for INDEX constants.
(aarch64_simd_vector_alignment): Make the alignment of partial
SVE vector modes the same as their minimum size.
(aarch64_evpc_sel): Use the mode form of aarch64_sve_pred_mode.
* config/aarch64/aarch64-sve.md (mov<SVE_FULL:mode>): Extend to...
(mov<SVE_ALL:mode>): ...this.
(movmisalign<SVE_FULL:mode>): Extend to...
(movmisalign<SVE_ALL:mode>): ...this.
(*aarch64_sve_mov<mode>_le): Rename to...
(*aarch64_sve_mov<mode>_ldr_str): ...this.
(*aarch64_sve_mov<SVE_FULL:mode>_be): Rename and extend to...
(*aarch64_sve_mov<SVE_ALL:mode>_no_ldr_str): ...this. Handle
partial modes regardless of endianness.
(aarch64_sve_reload_be): Rename to...
(aarch64_sve_reload_mem): ...this and enable for little-endian.
Use aarch64_sve_pred_mode to get the appropriate predicate mode.
(@aarch64_pred_mov<SVE_FULL:mode>): Extend to...
(@aarch64_pred_mov<SVE_ALL:mode>): ...this.
(*aarch64_sve_mov<SVE_FULL:mode>_subreg_be): Extend to...
(*aarch64_sve_mov<SVE_ALL:mode>_subreg_be): ...this.
(@aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
(@aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
(*aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
(*aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
(maskload<SVE_FULL:mode><vpred>): Extend to...
(maskload<SVE_ALL:mode><vpred>): ...this.
(maskstore<SVE_FULL:mode><vpred>): Extend to...
(maskstore<SVE_ALL:mode><vpred>): ...this.
(vec_duplicate<SVE_FULL:mode>): Extend to...
(vec_duplicate<SVE_ALL:mode>): ...this.
(*vec_duplicate<SVE_FULL:mode>_reg): Extend to...
(*vec_duplicate<SVE_ALL:mode>_reg): ...this.
(sve_ld1r<SVE_FULL:mode>): Extend to...
(sve_ld1r<SVE_ALL:mode>): ...this.
(vec_series<SVE_FULL_I:mode>): Extend to...
(vec_series<SVE_I:mode>): ...this.
(*vec_series<SVE_FULL_I:mode>_plus): Extend to...
(*vec_series<SVE_I:mode>_plus): ...this.
(@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Avoid
new VPRED ambiguity.
(@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise.
(add<SVE_FULL_I:mode>3): Extend to...
(add<SVE_I:mode>3): ...this.
* config/aarch64/iterators.md (SVE_ALL, SVE_I): New mode iterators.
(Vetype, Vesize, VEL, Vel, vwcore): Handle partial SVE vector modes.
(VPRED, vpred): Likewise.
(Vctype): New iterator.
(vw): Remove SVE modes.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* config/aarch64/iterators.md (SVE_PARTIAL): Rename to... * config/aarch64/iterators.md (SVE_PARTIAL): Rename to...
(SVE_PARTIAL_I): ...this. (SVE_PARTIAL_I): ...this.
* config/aarch64/aarch64-sve.md: Apply the above renaming throughout. * config/aarch64/aarch64-sve.md: Apply the above renaming throughout.
...@@ -123,13 +123,18 @@ SVE_MODES (4, VNx64, VNx32, VNx16, VNx8) ...@@ -123,13 +123,18 @@ SVE_MODES (4, VNx64, VNx32, VNx16, VNx8)
VECTOR_MODES_WITH_PREFIX (VNx, INT, 2, 1); VECTOR_MODES_WITH_PREFIX (VNx, INT, 2, 1);
VECTOR_MODES_WITH_PREFIX (VNx, INT, 4, 1); VECTOR_MODES_WITH_PREFIX (VNx, INT, 4, 1);
VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 1); VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 1);
VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 4, 1);
VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 8, 1);
ADJUST_NUNITS (VNx2QI, aarch64_sve_vg); ADJUST_NUNITS (VNx2QI, aarch64_sve_vg);
ADJUST_NUNITS (VNx2HI, aarch64_sve_vg); ADJUST_NUNITS (VNx2HI, aarch64_sve_vg);
ADJUST_NUNITS (VNx2SI, aarch64_sve_vg); ADJUST_NUNITS (VNx2SI, aarch64_sve_vg);
ADJUST_NUNITS (VNx2HF, aarch64_sve_vg);
ADJUST_NUNITS (VNx2SF, aarch64_sve_vg);
ADJUST_NUNITS (VNx4QI, aarch64_sve_vg * 2); ADJUST_NUNITS (VNx4QI, aarch64_sve_vg * 2);
ADJUST_NUNITS (VNx4HI, aarch64_sve_vg * 2); ADJUST_NUNITS (VNx4HI, aarch64_sve_vg * 2);
ADJUST_NUNITS (VNx4HF, aarch64_sve_vg * 2);
ADJUST_NUNITS (VNx8QI, aarch64_sve_vg * 4); ADJUST_NUNITS (VNx8QI, aarch64_sve_vg * 4);
...@@ -139,8 +144,11 @@ ADJUST_ALIGNMENT (VNx8QI, 1); ...@@ -139,8 +144,11 @@ ADJUST_ALIGNMENT (VNx8QI, 1);
ADJUST_ALIGNMENT (VNx2HI, 2); ADJUST_ALIGNMENT (VNx2HI, 2);
ADJUST_ALIGNMENT (VNx4HI, 2); ADJUST_ALIGNMENT (VNx4HI, 2);
ADJUST_ALIGNMENT (VNx2HF, 2);
ADJUST_ALIGNMENT (VNx4HF, 2);
ADJUST_ALIGNMENT (VNx2SI, 4); ADJUST_ALIGNMENT (VNx2SI, 4);
ADJUST_ALIGNMENT (VNx2SF, 4);
/* Quad float: 128-bit floating mode for long doubles. */ /* Quad float: 128-bit floating mode for long doubles. */
FLOAT_MODE (TF, 16, ieee_quad_format); FLOAT_MODE (TF, 16, ieee_quad_format);
......
...@@ -512,6 +512,7 @@ bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx); ...@@ -512,6 +512,7 @@ bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
bool aarch64_move_imm (HOST_WIDE_INT, machine_mode); bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
machine_mode aarch64_sve_int_mode (machine_mode); machine_mode aarch64_sve_int_mode (machine_mode);
opt_machine_mode aarch64_sve_pred_mode (unsigned int); opt_machine_mode aarch64_sve_pred_mode (unsigned int);
machine_mode aarch64_sve_pred_mode (machine_mode);
opt_machine_mode aarch64_sve_data_mode (scalar_mode, poly_uint64); opt_machine_mode aarch64_sve_data_mode (scalar_mode, poly_uint64);
bool aarch64_sve_mode_p (machine_mode); bool aarch64_sve_mode_p (machine_mode);
HOST_WIDE_INT aarch64_fold_sve_cnt_pat (aarch64_svpattern, unsigned int); HOST_WIDE_INT aarch64_fold_sve_cnt_pat (aarch64_svpattern, unsigned int);
......
...@@ -344,6 +344,21 @@ ...@@ -344,6 +344,21 @@
VNx4HI VNx2HI VNx4HI VNx2HI
VNx2SI]) VNx2SI])
;; All SVE vector modes.
(define_mode_iterator SVE_ALL [VNx16QI VNx8QI VNx4QI VNx2QI
VNx8HI VNx4HI VNx2HI
VNx8HF VNx4HF VNx2HF
VNx4SI VNx2SI
VNx4SF VNx2SF
VNx2DI
VNx2DF])
;; All SVE integer vector modes.
(define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI
VNx8HI VNx4HI VNx2HI
VNx4SI VNx2SI
VNx2DI])
;; Modes involved in extending or truncating SVE data, for 8 elements per ;; Modes involved in extending or truncating SVE data, for 8 elements per
;; 128-bit block. ;; 128-bit block.
(define_mode_iterator VNx8_NARROW [VNx8QI]) (define_mode_iterator VNx8_NARROW [VNx8QI])
...@@ -776,28 +791,37 @@ ...@@ -776,28 +791,37 @@
(HI "")]) (HI "")])
;; Mode-to-individual element type mapping. ;; Mode-to-individual element type mapping.
(define_mode_attr Vetype [(V8QI "b") (V16QI "b") (VNx16QI "b") (VNx16BI "b") (define_mode_attr Vetype [(V8QI "b") (V16QI "b")
(V4HI "h") (V8HI "h") (VNx8HI "h") (VNx8BI "h") (V4HI "h") (V8HI "h")
(V2SI "s") (V4SI "s") (VNx4SI "s") (VNx4BI "s") (V2SI "s") (V4SI "s")
(V2DI "d") (VNx2DI "d") (VNx2BI "d") (V2DI "d")
(V4HF "h") (V8HF "h") (VNx8HF "h") (V4HF "h") (V8HF "h")
(V2SF "s") (V4SF "s") (VNx4SF "s") (V2SF "s") (V4SF "s")
(V2DF "d") (VNx2DF "d") (V2DF "d")
(HF "h") (VNx16BI "b") (VNx8BI "h") (VNx4BI "s") (VNx2BI "d")
(SF "s") (DF "d") (VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b")
(QI "b") (HI "h") (VNx8HI "h") (VNx4HI "h") (VNx2HI "h")
(SI "s") (DI "d")]) (VNx8HF "h") (VNx4HF "h") (VNx2HF "h")
(VNx4SI "s") (VNx2SI "s")
(VNx4SF "s") (VNx2SF "s")
(VNx2DI "d")
(VNx2DF "d")
(HF "h")
(SF "s") (DF "d")
(QI "b") (HI "h")
(SI "s") (DI "d")])
;; Like Vetype, but map to types that are a quarter of the element size. ;; Like Vetype, but map to types that are a quarter of the element size.
(define_mode_attr Vetype_fourth [(VNx4SI "b") (VNx2DI "h")]) (define_mode_attr Vetype_fourth [(VNx4SI "b") (VNx2DI "h")])
;; Equivalent of "size" for a vector element. ;; Equivalent of "size" for a vector element.
(define_mode_attr Vesize [(VNx16QI "b") (VNx8QI "b") (define_mode_attr Vesize [(VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b")
(VNx4QI "b") (VNx2QI "b") (VNx8HI "h") (VNx4HI "h") (VNx2HI "h")
(VNx8HI "h") (VNx4HI "h") (VNx8HF "h") (VNx4HF "h") (VNx2HF "h")
(VNx2HI "h") (VNx8HF "h") (VNx4SI "w") (VNx2SI "w")
(VNx4SI "w") (VNx2SI "w") (VNx4SF "w") (VNx4SF "w") (VNx2SF "w")
(VNx2DI "d") (VNx2DF "d") (VNx2DI "d")
(VNx2DF "d")
(VNx32QI "b") (VNx48QI "b") (VNx64QI "b") (VNx32QI "b") (VNx48QI "b") (VNx64QI "b")
(VNx16HI "h") (VNx24HI "h") (VNx32HI "h") (VNx16HI "h") (VNx24HI "h") (VNx32HI "h")
(VNx16HF "h") (VNx24HF "h") (VNx32HF "h") (VNx16HF "h") (VNx24HF "h") (VNx32HF "h")
...@@ -806,6 +830,16 @@ ...@@ -806,6 +830,16 @@
(VNx4DI "d") (VNx6DI "d") (VNx8DI "d") (VNx4DI "d") (VNx6DI "d") (VNx8DI "d")
(VNx4DF "d") (VNx6DF "d") (VNx8DF "d")]) (VNx4DF "d") (VNx6DF "d") (VNx8DF "d")])
;; The Z register suffix for an SVE mode's element container, i.e. the
;; Vetype of full SVE modes that have the same number of elements.
(define_mode_attr Vctype [(VNx16QI "b") (VNx8QI "h") (VNx4QI "s") (VNx2QI "d")
(VNx8HI "h") (VNx4HI "s") (VNx2HI "d")
(VNx8HF "h") (VNx4HF "s") (VNx2HF "d")
(VNx4SI "s") (VNx2SI "d")
(VNx4SF "s") (VNx2SF "d")
(VNx2DI "d")
(VNx2DF "d")])
;; Vetype is used everywhere in scheduling type and assembly output, ;; Vetype is used everywhere in scheduling type and assembly output,
;; sometimes they are not the same, for example HF modes on some ;; sometimes they are not the same, for example HF modes on some
;; instructions. stype is defined to represent scheduling type ;; instructions. stype is defined to represent scheduling type
...@@ -827,26 +861,40 @@ ...@@ -827,26 +861,40 @@
(SI "8b") (SF "8b")]) (SI "8b") (SF "8b")])
;; Define element mode for each vector mode. ;; Define element mode for each vector mode.
(define_mode_attr VEL [(V8QI "QI") (V16QI "QI") (VNx16QI "QI") (define_mode_attr VEL [(V8QI "QI") (V16QI "QI")
(V4HI "HI") (V8HI "HI") (VNx8HI "HI") (V4HI "HI") (V8HI "HI")
(V2SI "SI") (V4SI "SI") (VNx4SI "SI") (V2SI "SI") (V4SI "SI")
(DI "DI") (V2DI "DI") (VNx2DI "DI") (DI "DI") (V2DI "DI")
(V4HF "HF") (V8HF "HF") (VNx8HF "HF") (V4HF "HF") (V8HF "HF")
(V2SF "SF") (V4SF "SF") (VNx4SF "SF") (V2SF "SF") (V4SF "SF")
(DF "DF") (V2DF "DF") (VNx2DF "DF") (DF "DF") (V2DF "DF")
(SI "SI") (HI "HI") (SI "SI") (HI "HI")
(QI "QI")]) (QI "QI")
(VNx16QI "QI") (VNx8QI "QI") (VNx4QI "QI") (VNx2QI "QI")
(VNx8HI "HI") (VNx4HI "HI") (VNx2HI "HI")
(VNx8HF "HF") (VNx4HF "HF") (VNx2HF "HF")
(VNx4SI "SI") (VNx2SI "SI")
(VNx4SF "SF") (VNx2SF "SF")
(VNx2DI "DI")
(VNx2DF "DF")])
;; Define element mode for each vector mode (lower case). ;; Define element mode for each vector mode (lower case).
(define_mode_attr Vel [(V8QI "qi") (V16QI "qi") (VNx16QI "qi") (define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
(V4HI "hi") (V8HI "hi") (VNx8HI "hi") (V4HI "hi") (V8HI "hi")
(V2SI "si") (V4SI "si") (VNx4SI "si") (V2SI "si") (V4SI "si")
(DI "di") (V2DI "di") (VNx2DI "di") (DI "di") (V2DI "di")
(V4HF "hf") (V8HF "hf") (VNx8HF "hf") (V4HF "hf") (V8HF "hf")
(V2SF "sf") (V4SF "sf") (VNx4SF "sf") (V2SF "sf") (V4SF "sf")
(V2DF "df") (DF "df") (VNx2DF "df") (V2DF "df") (DF "df")
(SI "si") (HI "hi") (SI "si") (HI "hi")
(QI "qi")]) (QI "qi")
(VNx16QI "qi") (VNx8QI "qi") (VNx4QI "qi") (VNx2QI "qi")
(VNx8HI "hi") (VNx4HI "hi") (VNx2HI "hi")
(VNx8HF "hf") (VNx4HF "hf") (VNx2HF "hf")
(VNx4SI "si") (VNx2SI "si")
(VNx4SF "sf") (VNx2SF "sf")
(VNx2DI "di")
(VNx2DF "df")])
;; Element mode with floating-point values replaced by like-sized integers. ;; Element mode with floating-point values replaced by like-sized integers.
(define_mode_attr VEL_INT [(VNx16QI "QI") (define_mode_attr VEL_INT [(VNx16QI "QI")
...@@ -994,23 +1042,29 @@ ...@@ -994,23 +1042,29 @@
(V4SF "2s")]) (V4SF "2s")])
;; Define corresponding core/FP element mode for each vector mode. ;; Define corresponding core/FP element mode for each vector mode.
(define_mode_attr vw [(V8QI "w") (V16QI "w") (VNx16QI "w") (define_mode_attr vw [(V8QI "w") (V16QI "w")
(V4HI "w") (V8HI "w") (VNx8HI "w") (V4HI "w") (V8HI "w")
(V2SI "w") (V4SI "w") (VNx4SI "w") (V2SI "w") (V4SI "w")
(DI "x") (V2DI "x") (VNx2DI "x") (DI "x") (V2DI "x")
(VNx8HF "h") (V2SF "s") (V4SF "s")
(V2SF "s") (V4SF "s") (VNx4SF "s") (V2DF "d")])
(V2DF "d") (VNx2DF "d")])
;; Corresponding core element mode for each vector mode. This is a ;; Corresponding core element mode for each vector mode. This is a
;; variation on <vw> mapping FP modes to GP regs. ;; variation on <vw> mapping FP modes to GP regs.
(define_mode_attr vwcore [(V8QI "w") (V16QI "w") (VNx16QI "w") (define_mode_attr vwcore [(V8QI "w") (V16QI "w")
(V4HI "w") (V8HI "w") (VNx8HI "w") (V4HI "w") (V8HI "w")
(V2SI "w") (V4SI "w") (VNx4SI "w") (V2SI "w") (V4SI "w")
(DI "x") (V2DI "x") (VNx2DI "x") (DI "x") (V2DI "x")
(V4HF "w") (V8HF "w") (VNx8HF "w") (V4HF "w") (V8HF "w")
(V2SF "w") (V4SF "w") (VNx4SF "w") (V2SF "w") (V4SF "w")
(V2DF "x") (VNx2DF "x")]) (V2DF "x")
(VNx16QI "w") (VNx8QI "w") (VNx4QI "w") (VNx2QI "w")
(VNx8HI "w") (VNx4HI "w") (VNx2HI "w")
(VNx8HF "w") (VNx4HF "w") (VNx2HF "w")
(VNx4SI "w") (VNx2SI "w")
(VNx4SF "w") (VNx2SF "w")
(VNx2DI "x")
(VNx2DF "x")])
;; Double vector types for ALLX. ;; Double vector types for ALLX.
(define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")]) (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
...@@ -1248,10 +1302,14 @@ ...@@ -1248,10 +1302,14 @@
;; The predicate mode associated with an SVE data mode. For structure modes ;; The predicate mode associated with an SVE data mode. For structure modes
;; this is equivalent to the <VPRED> of the subvector mode. ;; this is equivalent to the <VPRED> of the subvector mode.
(define_mode_attr VPRED [(VNx16QI "VNx16BI") (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI")
(VNx8HI "VNx8BI") (VNx8HF "VNx8BI") (VNx4QI "VNx4BI") (VNx2QI "VNx2BI")
(VNx4SI "VNx4BI") (VNx4SF "VNx4BI") (VNx8HI "VNx8BI") (VNx4HI "VNx4BI") (VNx2HI "VNx2BI")
(VNx2DI "VNx2BI") (VNx2DF "VNx2BI") (VNx8HF "VNx8BI") (VNx4HF "VNx4BI") (VNx2HF "VNx2BI")
(VNx4SI "VNx4BI") (VNx2SI "VNx2BI")
(VNx4SF "VNx4BI") (VNx2SF "VNx2BI")
(VNx2DI "VNx2BI")
(VNx2DF "VNx2BI")
(VNx32QI "VNx16BI") (VNx32QI "VNx16BI")
(VNx16HI "VNx8BI") (VNx16HF "VNx8BI") (VNx16HI "VNx8BI") (VNx16HF "VNx8BI")
(VNx8SI "VNx4BI") (VNx8SF "VNx4BI") (VNx8SI "VNx4BI") (VNx8SF "VNx4BI")
...@@ -1266,10 +1324,14 @@ ...@@ -1266,10 +1324,14 @@
(VNx8DI "VNx2BI") (VNx8DF "VNx2BI")]) (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")])
;; ...and again in lower case. ;; ...and again in lower case.
(define_mode_attr vpred [(VNx16QI "vnx16bi") (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi")
(VNx8HI "vnx8bi") (VNx8HF "vnx8bi") (VNx4QI "vnx4bi") (VNx2QI "vnx2bi")
(VNx4SI "vnx4bi") (VNx4SF "vnx4bi") (VNx8HI "vnx8bi") (VNx4HI "vnx4bi") (VNx2HI "vnx2bi")
(VNx2DI "vnx2bi") (VNx2DF "vnx2bi") (VNx8HF "vnx8bi") (VNx4HF "vnx4bi") (VNx2HF "vnx2bi")
(VNx4SI "vnx4bi") (VNx2SI "vnx2bi")
(VNx4SF "vnx4bi") (VNx2SF "vnx2bi")
(VNx2DI "vnx2bi")
(VNx2DF "vnx2bi")
(VNx32QI "vnx16bi") (VNx32QI "vnx16bi")
(VNx16HI "vnx8bi") (VNx16HF "vnx8bi") (VNx16HI "vnx8bi") (VNx16HF "vnx8bi")
(VNx8SI "vnx4bi") (VNx8SF "vnx4bi") (VNx8SI "vnx4bi") (VNx8SF "vnx4bi")
......
2019-11-16 Richard Sandiford <richard.sandiford@arm.com> 2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/mixed_size_1.c: New test.
* gcc.target/aarch64/sve/mixed_size_2.c: Likewise.
* gcc.target/aarch64/sve/mixed_size_3.c: Likewise.
* gcc.target/aarch64/sve/mixed_size_4.c: Likewise.
* gcc.target/aarch64/sve/mixed_size_5.c: Likewise.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/clastb_8.c: Use assembly tests to * gcc.target/aarch64/sve/clastb_8.c: Use assembly tests to
check for fully-masked loops. check for fully-masked loops.
......
/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1, \
TYPE2 *restrict dst2, TYPE2 *restrict src2, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
dst1[i] += src1[i]; \
dst2[i] = src2[i]; \
} \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint32_t, _Float16) \
T (uint64_t, uint32_t) \
T (uint64_t, float)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 4 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 4 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */
/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1, \
TYPE2 *restrict dst2, int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
dst1[i] += src1[i]; \
dst2[i] = 1; \
} \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint32_t, _Float16) \
T (uint64_t, uint32_t) \
T (uint64_t, float)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.b, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.h, #1\.0} 1 } } */
/* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.s, #1\.0} 1 } } */
/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 4 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 4 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */
/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1, \
TYPE2 *restrict dst2, TYPE2 src2, int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
dst1[i] += src1[i]; \
dst2[i] = src2; \
} \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint32_t, _Float16) \
T (uint64_t, uint32_t) \
T (uint64_t, float)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.b, w3\n} 1 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, w3\n} 1 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, w3\n} 1 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, h0\n} 1 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, s0\n} 1 } } */
/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 4 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 4 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */
/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1, \
TYPE2 *restrict dst2, TYPE2 n) \
{ \
for (TYPE2 i = 0; i < n; ++i) \
{ \
dst1[i] += src1[i]; \
dst2[i] = i; \
} \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint64_t, uint32_t)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-not {\tindex\tz[0-9]+\.b,} } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, #0, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, #0, #1\n} 1 } } */
/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, #0, #1\n} 1 } } */
/* { dg-final { scan-assembler-not {\tcntb\t} } } */
/* { dg-final { scan-assembler-times {\tcnth\t} 1 } } */
/* { dg-final { scan-assembler-times {\tcntw\t} 1 } } */
/* { dg-final { scan-assembler-times {\tcntd\t} 1 } } */
/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 1 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 1 } } */
/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -msve-vector-bits=512" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1, \
TYPE2 *restrict dst2, TYPE2 *restrict src2, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
{ \
dst1[i * 2] = src1[i * 2] + 1; \
dst1[i * 2 + 1] = src1[i * 2 + 1] + 1; \
dst2[i * 2] = 2; \
dst2[i * 2 + 1] = 3; \
} \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint32_t, _Float16) \
T (uint64_t, uint32_t) \
T (uint64_t, float)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 2 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment