[AArch64] Add autovec support for partial SVE vectors

This patch adds the bare minimum needed to support autovectorisation of partial SVE vectors, namely moves and integer addition. Later patches add more interesting cases. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * config/aarch64/aarch64-modes.def: Define partial SVE vector float modes. * config/aarch64/aarch64-protos.h (aarch64_sve_pred_mode): New function. * config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle the new vector float modes. (aarch64_sve_container_bits): New function. (aarch64_sve_pred_mode): Likewise. (aarch64_get_mask_mode): Use it. (aarch64_sve_element_int_mode): Handle structure modes and partial modes. (aarch64_sve_container_int_mode): New function. (aarch64_vectorize_related_mode): Return SVE modes when given SVE modes. Handle partial modes, taking the preferred number of units from the size of the given mode. (aarch64_hard_regno_mode_ok): Allow partial modes to be stored in registers. (aarch64_expand_sve_ld1rq): Use the mode form of aarch64_sve_pred_mode. (aarch64_expand_sve_const_vector): Handle partial SVE vectors. (aarch64_split_sve_subreg_move): Use the mode form of aarch64_sve_pred_mode. (aarch64_secondary_reload): Handle partial modes in the same way as full big-endian vectors. (aarch64_vector_mode_supported_p): Allow partial SVE vectors. (aarch64_autovectorize_vector_modes): Try unpacked SVE vectors, merging with the Advanced SIMD modes. If two modes have the same size, try the Advanced SIMD mode first. (aarch64_simd_valid_immediate): Use the container rather than the element mode for INDEX constants. (aarch64_simd_vector_alignment): Make the alignment of partial SVE vector modes the same as their minimum size. (aarch64_evpc_sel): Use the mode form of aarch64_sve_pred_mode. * config/aarch64/aarch64-sve.md (mov<SVE_FULL:mode>): Extend to... (mov<SVE_ALL:mode>): ...this. (movmisalign<SVE_FULL:mode>): Extend to... (movmisalign<SVE_ALL:mode>): ...this. (*aarch64_sve_mov<mode>_le): Rename to... (*aarch64_sve_mov<mode>_ldr_str): ...this. (*aarch64_sve_mov<SVE_FULL:mode>_be): Rename and extend to... (*aarch64_sve_mov<SVE_ALL:mode>_no_ldr_str): ...this. Handle partial modes regardless of endianness. (aarch64_sve_reload_be): Rename to... (aarch64_sve_reload_mem): ...this and enable for little-endian. Use aarch64_sve_pred_mode to get the appropriate predicate mode. (@aarch64_pred_mov<SVE_FULL:mode>): Extend to... (@aarch64_pred_mov<SVE_ALL:mode>): ...this. (*aarch64_sve_mov<SVE_FULL:mode>_subreg_be): Extend to... (*aarch64_sve_mov<SVE_ALL:mode>_subreg_be): ...this. (@aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to... (@aarch64_sve_reinterpret<SVE_ALL:mode>): ...this. (*aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to... (*aarch64_sve_reinterpret<SVE_ALL:mode>): ...this. (maskload<SVE_FULL:mode><vpred>): Extend to... (maskload<SVE_ALL:mode><vpred>): ...this. (maskstore<SVE_FULL:mode><vpred>): Extend to... (maskstore<SVE_ALL:mode><vpred>): ...this. (vec_duplicate<SVE_FULL:mode>): Extend to... (vec_duplicate<SVE_ALL:mode>): ...this. (*vec_duplicate<SVE_FULL:mode>_reg): Extend to... (*vec_duplicate<SVE_ALL:mode>_reg): ...this. (sve_ld1r<SVE_FULL:mode>): Extend to... (sve_ld1r<SVE_ALL:mode>): ...this. (vec_series<SVE_FULL_I:mode>): Extend to... (vec_series<SVE_I:mode>): ...this. (*vec_series<SVE_FULL_I:mode>_plus): Extend to... (*vec_series<SVE_I:mode>_plus): ...this. (@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Avoid new VPRED ambiguity. (@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise. (add<SVE_FULL_I:mode>3): Extend to... (add<SVE_I:mode>3): ...this. * config/aarch64/iterators.md (SVE_ALL, SVE_I): New mode iterators. (Vetype, Vesize, VEL, Vel, vwcore): Handle partial SVE vector modes. (VPRED, vpred): Likewise. (Vctype): New iterator. (vw): Remove SVE modes. gcc/testsuite/ * gcc.target/aarch64/sve/mixed_size_1.c: New test. * gcc.target/aarch64/sve/mixed_size_2.c: Likewise. * gcc.target/aarch64/sve/mixed_size_3.c: Likewise. * gcc.target/aarch64/sve/mixed_size_4.c: Likewise. * gcc.target/aarch64/sve/mixed_size_5.c: Likewise. From-SVN: r278341

[AArch64] Add autovec support for partial SVE vectors
This patch adds the bare minimum needed to support autovectorisation of partial SVE vectors, namely moves and integer addition. Later patches add more interesting cases. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * config/aarch64/aarch64-modes.def: Define partial SVE vector float modes. * config/aarch64/aarch64-protos.h (aarch64_sve_pred_mode): New function. * config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle the new vector float modes. (aarch64_sve_container_bits): New function. (aarch64_sve_pred_mode): Likewise. (aarch64_get_mask_mode): Use it. (aarch64_sve_element_int_mode): Handle structure modes and partial modes. (aarch64_sve_container_int_mode): New function. (aarch64_vectorize_related_mode): Return SVE modes when given SVE modes. Handle partial modes, taking the preferred number of units from the size of the given mode. (aarch64_hard_regno_mode_ok): Allow partial modes to be stored in registers. (aarch64_expand_sve_ld1rq): Use the mode form of aarch64_sve_pred_mode. (aarch64_expand_sve_const_vector): Handle partial SVE vectors. (aarch64_split_sve_subreg_move): Use the mode form of aarch64_sve_pred_mode. (aarch64_secondary_reload): Handle partial modes in the same way as full big-endian vectors. (aarch64_vector_mode_supported_p): Allow partial SVE vectors. (aarch64_autovectorize_vector_modes): Try unpacked SVE vectors, merging with the Advanced SIMD modes. If two modes have the same size, try the Advanced SIMD mode first. (aarch64_simd_valid_immediate): Use the container rather than the element mode for INDEX constants. (aarch64_simd_vector_alignment): Make the alignment of partial SVE vector modes the same as their minimum size. (aarch64_evpc_sel): Use the mode form of aarch64_sve_pred_mode. * config/aarch64/aarch64-sve.md (mov<SVE_FULL:mode>): Extend to... (mov<SVE_ALL:mode>): ...this. (movmisalign<SVE_FULL:mode>): Extend to... (movmisalign<SVE_ALL:mode>): ...this. (*aarch64_sve_mov<mode>_le): Rename to... (*aarch64_sve_mov<mode>_ldr_str): ...this. (*aarch64_sve_mov<SVE_FULL:mode>_be): Rename and extend to... (*aarch64_sve_mov<SVE_ALL:mode>_no_ldr_str): ...this. Handle partial modes regardless of endianness. (aarch64_sve_reload_be): Rename to... (aarch64_sve_reload_mem): ...this and enable for little-endian. Use aarch64_sve_pred_mode to get the appropriate predicate mode. (@aarch64_pred_mov<SVE_FULL:mode>): Extend to... (@aarch64_pred_mov<SVE_ALL:mode>): ...this. (*aarch64_sve_mov<SVE_FULL:mode>_subreg_be): Extend to... (*aarch64_sve_mov<SVE_ALL:mode>_subreg_be): ...this. (@aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to... (@aarch64_sve_reinterpret<SVE_ALL:mode>): ...this. (*aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to... (*aarch64_sve_reinterpret<SVE_ALL:mode>): ...this. (maskload<SVE_FULL:mode><vpred>): Extend to... (maskload<SVE_ALL:mode><vpred>): ...this. (maskstore<SVE_FULL:mode><vpred>): Extend to... (maskstore<SVE_ALL:mode><vpred>): ...this. (vec_duplicate<SVE_FULL:mode>): Extend to... (vec_duplicate<SVE_ALL:mode>): ...this. (*vec_duplicate<SVE_FULL:mode>_reg): Extend to... (*vec_duplicate<SVE_ALL:mode>_reg): ...this. (sve_ld1r<SVE_FULL:mode>): Extend to... (sve_ld1r<SVE_ALL:mode>): ...this. (vec_series<SVE_FULL_I:mode>): Extend to... (vec_series<SVE_I:mode>): ...this. (*vec_series<SVE_FULL_I:mode>_plus): Extend to... (*vec_series<SVE_I:mode>_plus): ...this. (@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Avoid new VPRED ambiguity. (@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise. (add<SVE_FULL_I:mode>3): Extend to... (add<SVE_I:mode>3): ...this. * config/aarch64/iterators.md (SVE_ALL, SVE_I): New mode iterators. (Vetype, Vesize, VEL, Vel, vwcore): Handle partial SVE vector modes. (VPRED, vpred): Likewise. (Vctype): New iterator. (vw): Remove SVE modes. gcc/testsuite/ * gcc.target/aarch64/sve/mixed_size_1.c: New test. * gcc.target/aarch64/sve/mixed_size_2.c: Likewise. * gcc.target/aarch64/sve/mixed_size_3.c: Likewise. * gcc.target/aarch64/sve/mixed_size_4.c: Likewise. * gcc.target/aarch64/sve/mixed_size_5.c: Likewise. From-SVN: r278341
cc68f7c2 · Richard Sandiford · Richard Sandiford · 7f333599 · cc68f7c2 · cc68f7c2
Commit cc68f7c2 authored Nov 16, 2019 by Richard Sandiford Committed by Richard Sandiford Nov 16, 2019
12 changed files
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
 2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
+	* config/aarch64/aarch64-modes.def: Define partial SVE vector
+	float modes.
+	* config/aarch64/aarch64-protos.h (aarch64_sve_pred_mode): New
+	function.
+	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle the
+	new vector float modes.
+	(aarch64_sve_container_bits): New function.
+	(aarch64_sve_pred_mode): Likewise.
+	(aarch64_get_mask_mode): Use it.
+	(aarch64_sve_element_int_mode): Handle structure modes and partial
+	modes.
+	(aarch64_sve_container_int_mode): New function.
+	(aarch64_vectorize_related_mode): Return SVE modes when given
+	SVE modes.  Handle partial modes, taking the preferred number
+	of units from the size of the given mode.
+	(aarch64_hard_regno_mode_ok): Allow partial modes to be stored
+	in registers.
+	(aarch64_expand_sve_ld1rq): Use the mode form of aarch64_sve_pred_mode.
+	(aarch64_expand_sve_const_vector): Handle partial SVE vectors.
+	(aarch64_split_sve_subreg_move): Use the mode form of
+	aarch64_sve_pred_mode.
+	(aarch64_secondary_reload): Handle partial modes in the same way
+	as full big-endian vectors.
+	(aarch64_vector_mode_supported_p): Allow partial SVE vectors.
+	(aarch64_autovectorize_vector_modes): Try unpacked SVE vectors,
+	merging with the Advanced SIMD modes.  If two modes have the
+	same size, try the Advanced SIMD mode first.
+	(aarch64_simd_valid_immediate): Use the container rather than
+	the element mode for INDEX constants.
+	(aarch64_simd_vector_alignment): Make the alignment of partial
+	SVE vector modes the same as their minimum size.
+	(aarch64_evpc_sel): Use the mode form of aarch64_sve_pred_mode.
+	* config/aarch64/aarch64-sve.md (mov<SVE_FULL:mode>): Extend to...
+	(mov<SVE_ALL:mode>): ...this.
+	(movmisalign<SVE_FULL:mode>): Extend to...
+	(movmisalign<SVE_ALL:mode>): ...this.
+	(*aarch64_sve_mov<mode>_le): Rename to...
+	(*aarch64_sve_mov<mode>_ldr_str): ...this.
+	(*aarch64_sve_mov<SVE_FULL:mode>_be): Rename and extend to...
+	(*aarch64_sve_mov<SVE_ALL:mode>_no_ldr_str): ...this.  Handle
+	partial modes regardless of endianness.
+	(aarch64_sve_reload_be): Rename to...
+	(aarch64_sve_reload_mem): ...this and enable for little-endian.
+	Use aarch64_sve_pred_mode to get the appropriate predicate mode.
+	(@aarch64_pred_mov<SVE_FULL:mode>): Extend to...
+	(@aarch64_pred_mov<SVE_ALL:mode>): ...this.
+	(*aarch64_sve_mov<SVE_FULL:mode>_subreg_be): Extend to...
+	(*aarch64_sve_mov<SVE_ALL:mode>_subreg_be): ...this.
+	(@aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
+	(@aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
+	(*aarch64_sve_reinterpret<SVE_FULL:mode>): Extend to...
+	(*aarch64_sve_reinterpret<SVE_ALL:mode>): ...this.
+	(maskload<SVE_FULL:mode><vpred>): Extend to...
+	(maskload<SVE_ALL:mode><vpred>): ...this.
+	(maskstore<SVE_FULL:mode><vpred>): Extend to...
+	(maskstore<SVE_ALL:mode><vpred>): ...this.
+	(vec_duplicate<SVE_FULL:mode>): Extend to...
+	(vec_duplicate<SVE_ALL:mode>): ...this.
+	(*vec_duplicate<SVE_FULL:mode>_reg): Extend to...
+	(*vec_duplicate<SVE_ALL:mode>_reg): ...this.
+	(sve_ld1r<SVE_FULL:mode>): Extend to...
+	(sve_ld1r<SVE_ALL:mode>): ...this.
+	(vec_series<SVE_FULL_I:mode>): Extend to...
+	(vec_series<SVE_I:mode>): ...this.
+	(*vec_series<SVE_FULL_I:mode>_plus): Extend to...
+	(*vec_series<SVE_I:mode>_plus): ...this.
+	(@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Avoid
+	new VPRED ambiguity.
+	(@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>): Likewise.
+	(add<SVE_FULL_I:mode>3): Extend to...
+	(add<SVE_I:mode>3): ...this.
+	* config/aarch64/iterators.md (SVE_ALL, SVE_I): New mode iterators.
+	(Vetype, Vesize, VEL, Vel, vwcore): Handle partial SVE vector modes.
+	(VPRED, vpred): Likewise.
+	(Vctype): New iterator.
+	(vw): Remove SVE modes.
+2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
 	* config/aarch64/iterators.md (SVE_PARTIAL): Rename to...
 	(SVE_PARTIAL_I): ...this.
 	* config/aarch64/aarch64-sve.md: Apply the above renaming throughout.
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -123,13 +123,18 @@ SVE_MODES (4, VNx64, VNx32, VNx16, VNx8)
 VECTOR_MODES_WITH_PREFIX (VNx, INT, 2, 1);
 VECTOR_MODES_WITH_PREFIX (VNx, INT, 4, 1);
 VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 1);
+VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 4, 1);
+VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 8, 1);
 ADJUST_NUNITS (VNx2QI, aarch64_sve_vg);
 ADJUST_NUNITS (VNx2HI, aarch64_sve_vg);
 ADJUST_NUNITS (VNx2SI, aarch64_sve_vg);
+ADJUST_NUNITS (VNx2HF, aarch64_sve_vg);
+ADJUST_NUNITS (VNx2SF, aarch64_sve_vg);
 ADJUST_NUNITS (VNx4QI, aarch64_sve_vg * 2);
 ADJUST_NUNITS (VNx4HI, aarch64_sve_vg * 2);
+ADJUST_NUNITS (VNx4HF, aarch64_sve_vg * 2);
 ADJUST_NUNITS (VNx8QI, aarch64_sve_vg * 4);
@@ -139,8 +144,11 @@ ADJUST_ALIGNMENT (VNx8QI, 1);
 ADJUST_ALIGNMENT (VNx2HI, 2);
 ADJUST_ALIGNMENT (VNx4HI, 2);
+ADJUST_ALIGNMENT (VNx2HF, 2);
+ADJUST_ALIGNMENT (VNx4HF, 2);
 ADJUST_ALIGNMENT (VNx2SI, 4);
+ADJUST_ALIGNMENT (VNx2SF, 4);
 /* Quad float: 128-bit floating mode for long doubles.  */
 FLOAT_MODE (TF, 16, ieee_quad_format);

--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -512,6 +512,7 @@ bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 machine_mode aarch64_sve_int_mode (machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
+machine_mode aarch64_sve_pred_mode (machine_mode);
 opt_machine_mode aarch64_sve_data_mode (scalar_mode, poly_uint64);
 bool aarch64_sve_mode_p (machine_mode);
 HOST_WIDE_INT aarch64_fold_sve_cnt_pat (aarch64_svpattern, unsigned int);

--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -344,6 +344,21 @@
 				     VNx4HI VNx2HI
 				     VNx2SI])
+;; All SVE vector modes.
+(define_mode_iterator SVE_ALL [VNx16QI VNx8QI VNx4QI VNx2QI
+			       VNx8HI VNx4HI VNx2HI
+			       VNx8HF VNx4HF VNx2HF
+			       VNx4SI VNx2SI
+			       VNx4SF VNx2SF
+			       VNx2DI
+			       VNx2DF])
+;; All SVE integer vector modes.
+(define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI
+			     VNx8HI VNx4HI VNx2HI
+			     VNx4SI VNx2SI
+			     VNx2DI])
 ;; Modes involved in extending or truncating SVE data, for 8 elements per
 ;; 128-bit block.
 (define_mode_iterator VNx8_NARROW [VNx8QI])
@@ -776,28 +791,37 @@
 			   (HI   "")])
 ;; Mode-to-individual element type mapping.
-(define_mode_attr Vetype [(V8QI "b") (V16QI "b") (VNx16QI "b") (VNx16BI "b")
+(define_mode_attr Vetype [(V8QI "b") (V16QI "b")
-			  (V4HI "h") (V8HI  "h") (VNx8HI  "h") (VNx8BI  "h")
+			  (V4HI "h") (V8HI  "h")
-			  (V2SI "s") (V4SI  "s") (VNx4SI  "s") (VNx4BI  "s")
+			  (V2SI "s") (V4SI  "s")
-			  (V2DI "d")             (VNx2DI  "d") (VNx2BI  "d")
+			  (V2DI "d")
-			  (V4HF "h") (V8HF  "h") (VNx8HF  "h")
+			  (V4HF "h") (V8HF  "h")
-			  (V2SF "s") (V4SF  "s") (VNx4SF  "s")
+			  (V2SF "s") (V4SF  "s")
-			  (V2DF "d")             (VNx2DF  "d")
+			  (V2DF "d")
-			  (HF   "h")
+			  (VNx16BI "b") (VNx8BI "h") (VNx4BI "s") (VNx2BI "d")
-			  (SF   "s") (DF  "d")
+			  (VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b")
-			  (QI "b")   (HI "h")
+			  (VNx8HI "h") (VNx4HI "h") (VNx2HI "h")
-			  (SI "s")   (DI "d")])
+			  (VNx8HF "h") (VNx4HF "h") (VNx2HF "h")
+			  (VNx4SI "s") (VNx2SI "s")
+			  (VNx4SF "s") (VNx2SF "s")
+			  (VNx2DI "d")
+			  (VNx2DF "d")
+			  (HF "h")
+			  (SF "s") (DF "d")
+			  (QI "b") (HI "h")
+			  (SI "s") (DI "d")])
 ;; Like Vetype, but map to types that are a quarter of the element size.
 (define_mode_attr Vetype_fourth [(VNx4SI "b") (VNx2DI "h")])
 ;; Equivalent of "size" for a vector element.
-(define_mode_attr Vesize [(VNx16QI "b") (VNx8QI  "b")
+(define_mode_attr Vesize [(VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b")
-			  (VNx4QI  "b") (VNx2QI  "b")
+			  (VNx8HI "h") (VNx4HI "h") (VNx2HI "h")
-			  (VNx8HI  "h") (VNx4HI  "h")
+			  (VNx8HF "h") (VNx4HF "h") (VNx2HF "h")
-			  (VNx2HI  "h") (VNx8HF  "h")
+			  (VNx4SI "w") (VNx2SI "w")
-			  (VNx4SI  "w") (VNx2SI  "w") (VNx4SF  "w")
+			  (VNx4SF "w") (VNx2SF "w")
-			  (VNx2DI  "d") (VNx2DF  "d")
+			  (VNx2DI "d")
+			  (VNx2DF "d")
 			  (VNx32QI "b") (VNx48QI "b") (VNx64QI "b")
 			  (VNx16HI "h") (VNx24HI "h") (VNx32HI "h")
 			  (VNx16HF "h") (VNx24HF "h") (VNx32HF "h")
@@ -806,6 +830,16 @@
 			  (VNx4DI  "d") (VNx6DI  "d") (VNx8DI  "d")
 			  (VNx4DF  "d") (VNx6DF  "d") (VNx8DF  "d")])
+;; The Z register suffix for an SVE mode's element container, i.e. the
+;; Vetype of full SVE modes that have the same number of elements.
+(define_mode_attr Vctype [(VNx16QI "b") (VNx8QI "h") (VNx4QI "s") (VNx2QI "d")
+			  (VNx8HI "h") (VNx4HI "s") (VNx2HI "d")
+			  (VNx8HF "h") (VNx4HF "s") (VNx2HF "d")
+			  (VNx4SI "s") (VNx2SI "d")
+			  (VNx4SF "s") (VNx2SF "d")
+			  (VNx2DI "d")
+			  (VNx2DF "d")])
 ;; Vetype is used everywhere in scheduling type and assembly output,
 ;; sometimes they are not the same, for example HF modes on some
 ;; instructions.  stype is defined to represent scheduling type
@@ -827,26 +861,40 @@
 			  (SI   "8b")  (SF    "8b")])
 ;; Define element mode for each vector mode.
-(define_mode_attr VEL [(V8QI  "QI") (V16QI "QI") (VNx16QI "QI")
+(define_mode_attr VEL [(V8QI  "QI") (V16QI "QI")
-			(V4HI "HI") (V8HI  "HI") (VNx8HI  "HI")
+		       (V4HI "HI") (V8HI  "HI")
-			(V2SI "SI") (V4SI  "SI") (VNx4SI  "SI")
+		       (V2SI "SI") (V4SI  "SI")
-			(DI   "DI") (V2DI  "DI") (VNx2DI  "DI")
+		       (DI   "DI") (V2DI  "DI")
-			(V4HF "HF") (V8HF  "HF") (VNx8HF  "HF")
+		       (V4HF "HF") (V8HF  "HF")
-			(V2SF "SF") (V4SF  "SF") (VNx4SF  "SF")
+		       (V2SF "SF") (V4SF  "SF")
-			(DF   "DF") (V2DF  "DF") (VNx2DF  "DF")
+		       (DF   "DF") (V2DF  "DF")
-			(SI   "SI") (HI    "HI")
+		       (SI   "SI") (HI    "HI")
-			(QI   "QI")])
+		       (QI   "QI")
+		       (VNx16QI "QI") (VNx8QI "QI") (VNx4QI "QI") (VNx2QI "QI")
+		       (VNx8HI "HI") (VNx4HI "HI") (VNx2HI "HI")
+		       (VNx8HF "HF") (VNx4HF "HF") (VNx2HF "HF")
+		       (VNx4SI "SI") (VNx2SI "SI")
+		       (VNx4SF "SF") (VNx2SF "SF")
+		       (VNx2DI "DI")
+		       (VNx2DF "DF")])
 ;; Define element mode for each vector mode (lower case).
-(define_mode_attr Vel [(V8QI "qi") (V16QI "qi") (VNx16QI "qi")
+(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
-			(V4HI "hi") (V8HI "hi") (VNx8HI  "hi")
+		       (V4HI "hi") (V8HI "hi")
-			(V2SI "si") (V4SI "si") (VNx4SI  "si")
+		       (V2SI "si") (V4SI "si")
-			(DI "di")   (V2DI "di") (VNx2DI  "di")
+		       (DI   "di") (V2DI "di")
-			(V4HF "hf") (V8HF "hf") (VNx8HF  "hf")
+		       (V4HF "hf") (V8HF "hf")
-			(V2SF "sf") (V4SF "sf") (VNx4SF  "sf")
+		       (V2SF "sf") (V4SF "sf")
-			(V2DF "df") (DF "df")   (VNx2DF  "df")
+		       (V2DF "df") (DF   "df")
-			(SI   "si") (HI   "hi")
+		       (SI   "si") (HI   "hi")
-			(QI   "qi")])
+		       (QI   "qi")
+		       (VNx16QI "qi") (VNx8QI "qi") (VNx4QI "qi") (VNx2QI "qi")
+		       (VNx8HI "hi") (VNx4HI "hi") (VNx2HI "hi")
+		       (VNx8HF "hf") (VNx4HF "hf") (VNx2HF "hf")
+		       (VNx4SI "si") (VNx2SI "si")
+		       (VNx4SF "sf") (VNx2SF "sf")
+		       (VNx2DI "di")
+		       (VNx2DF "df")])
 ;; Element mode with floating-point values replaced by like-sized integers.
 (define_mode_attr VEL_INT [(VNx16QI "QI")
@@ -994,23 +1042,29 @@
 			     (V4SF "2s")])
 ;; Define corresponding core/FP element mode for each vector mode.
-(define_mode_attr vw [(V8QI "w") (V16QI "w") (VNx16QI "w")
+(define_mode_attr vw [(V8QI "w") (V16QI "w")
-		      (V4HI "w") (V8HI "w") (VNx8HI "w")
+		      (V4HI "w") (V8HI "w")
-		      (V2SI "w") (V4SI "w") (VNx4SI "w")
+		      (V2SI "w") (V4SI "w")
-		      (DI   "x") (V2DI "x") (VNx2DI "x")
+		      (DI   "x") (V2DI "x")
-		      (VNx8HF "h")
+		      (V2SF "s") (V4SF "s")
-		      (V2SF "s") (V4SF "s") (VNx4SF "s")
+		      (V2DF "d")])
-		      (V2DF "d") (VNx2DF "d")])
 ;; Corresponding core element mode for each vector mode.  This is a
 ;; variation on <vw> mapping FP modes to GP regs.
-(define_mode_attr vwcore [(V8QI "w") (V16QI "w") (VNx16QI "w")
+(define_mode_attr vwcore [(V8QI "w") (V16QI "w")
-			  (V4HI "w") (V8HI "w") (VNx8HI "w")
+			  (V4HI "w") (V8HI "w")
-			  (V2SI "w") (V4SI "w") (VNx4SI "w")
+			  (V2SI "w") (V4SI "w")
-			  (DI   "x") (V2DI "x") (VNx2DI "x")
+			  (DI   "x") (V2DI "x")
-			  (V4HF "w") (V8HF "w") (VNx8HF "w")
+			  (V4HF "w") (V8HF "w")
-			  (V2SF "w") (V4SF "w") (VNx4SF "w")
+			  (V2SF "w") (V4SF "w")
-			  (V2DF "x") (VNx2DF "x")])
+			  (V2DF "x")
+			  (VNx16QI "w") (VNx8QI "w") (VNx4QI "w") (VNx2QI "w")
+			  (VNx8HI "w") (VNx4HI "w") (VNx2HI "w")
+			  (VNx8HF "w") (VNx4HF "w") (VNx2HF "w")
+			  (VNx4SI "w") (VNx2SI "w")
+			  (VNx4SF "w") (VNx2SF "w")
+			  (VNx2DI "x")
+			  (VNx2DF "x")])
 ;; Double vector types for ALLX.
 (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
@@ -1248,10 +1302,14 @@
 ;; The predicate mode associated with an SVE data mode.  For structure modes
 ;; this is equivalent to the <VPRED> of the subvector mode.
-(define_mode_attr VPRED [(VNx16QI "VNx16BI")
+(define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI")
-			 (VNx8HI "VNx8BI") (VNx8HF "VNx8BI")
+			 (VNx4QI "VNx4BI") (VNx2QI "VNx2BI")
-			 (VNx4SI "VNx4BI") (VNx4SF "VNx4BI")
+			 (VNx8HI "VNx8BI") (VNx4HI "VNx4BI") (VNx2HI "VNx2BI")
-			 (VNx2DI "VNx2BI") (VNx2DF "VNx2BI")
+			 (VNx8HF "VNx8BI") (VNx4HF "VNx4BI") (VNx2HF "VNx2BI")
+			 (VNx4SI "VNx4BI") (VNx2SI "VNx2BI")
+			 (VNx4SF "VNx4BI") (VNx2SF "VNx2BI")
+			 (VNx2DI "VNx2BI")
+			 (VNx2DF "VNx2BI")
 			 (VNx32QI "VNx16BI")
 			 (VNx16HI "VNx8BI") (VNx16HF "VNx8BI")
 			 (VNx8SI "VNx4BI") (VNx8SF "VNx4BI")
@@ -1266,10 +1324,14 @@
 			 (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")])
 ;; ...and again in lower case.
-(define_mode_attr vpred [(VNx16QI "vnx16bi")
+(define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi")
-			 (VNx8HI "vnx8bi") (VNx8HF "vnx8bi")
+			 (VNx4QI "vnx4bi") (VNx2QI "vnx2bi")
-			 (VNx4SI "vnx4bi") (VNx4SF "vnx4bi")
+			 (VNx8HI "vnx8bi") (VNx4HI "vnx4bi") (VNx2HI "vnx2bi")
-			 (VNx2DI "vnx2bi") (VNx2DF "vnx2bi")
+			 (VNx8HF "vnx8bi") (VNx4HF "vnx4bi") (VNx2HF "vnx2bi")
+			 (VNx4SI "vnx4bi") (VNx2SI "vnx2bi")
+			 (VNx4SF "vnx4bi") (VNx2SF "vnx2bi")
+			 (VNx2DI "vnx2bi")
+			 (VNx2DF "vnx2bi")
 			 (VNx32QI "vnx16bi")
 			 (VNx16HI "vnx8bi") (VNx16HF "vnx8bi")
 			 (VNx8SI "vnx4bi") (VNx8SF "vnx4bi")

--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
 2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
+	* gcc.target/aarch64/sve/mixed_size_1.c: New test.
+	* gcc.target/aarch64/sve/mixed_size_2.c: Likewise.
+	* gcc.target/aarch64/sve/mixed_size_3.c: Likewise.
+	* gcc.target/aarch64/sve/mixed_size_4.c: Likewise.
+	* gcc.target/aarch64/sve/mixed_size_5.c: Likewise.
+2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
 	* gcc.target/aarch64/sve/clastb_8.c: Use assembly tests to
 	check for fully-masked loops.

--- a/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_1.c
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
+#include <stdint.h>
+#define TEST_LOOP(TYPE1, TYPE2)						\
+  void									\
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1,	\
+		       TYPE2 *restrict dst2, TYPE2 *restrict src2,	\
+		       int n)						\
+  {									\
+    for (int i = 0; i < n; ++i)						\
+      {									\
+	dst1[i] += src1[i];						\
+	dst2[i] = src2[i];						\
+      }									\
+  }
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint32_t, _Float16) \
+  T (uint64_t, uint32_t) \
+  T (uint64_t, float)
+TEST_ALL (TEST_LOOP)
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 4 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 4 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */
--- a/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_2.c
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
+#include <stdint.h>
+#define TEST_LOOP(TYPE1, TYPE2)						\
+  void									\
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1,	\
+		       TYPE2 *restrict dst2, int n)			\
+  {									\
+    for (int i = 0; i < n; ++i)						\
+      {									\
+	dst1[i] += src1[i];						\
+	dst2[i] = 1;							\
+      }									\
+  }
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint32_t, _Float16) \
+  T (uint64_t, uint32_t) \
+  T (uint64_t, float)
+TEST_ALL (TEST_LOOP)
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.b, #1\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, #1\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, #1\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.h, #1\.0} 1 } } */
+/* { dg-final { scan-assembler-times {\tfmov\tz[0-9]+\.s, #1\.0} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 4 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 4 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */
--- a/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_3.c
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
+#include <stdint.h>
+#define TEST_LOOP(TYPE1, TYPE2)						\
+  void									\
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1,	\
+		       TYPE2 *restrict dst2, TYPE2 src2, int n)		\
+  {									\
+    for (int i = 0; i < n; ++i)						\
+      {									\
+	dst1[i] += src1[i];						\
+	dst2[i] = src2;							\
+      }									\
+  }
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint32_t, _Float16) \
+  T (uint64_t, uint32_t) \
+  T (uint64_t, float)
+TEST_ALL (TEST_LOOP)
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.b, w3\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, w3\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, w3\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, h0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.s, s0\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 4 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 4 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */
--- a/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_4.c
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
+#include <stdint.h>
+#define TEST_LOOP(TYPE1, TYPE2)						\
+  void									\
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1,	\
+		       TYPE2 *restrict dst2, TYPE2 n)			\
+  {									\
+    for (TYPE2 i = 0; i < n; ++i)					\
+      {									\
+	dst1[i] += src1[i];						\
+	dst2[i] = i;							\
+      }									\
+  }
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint64_t, uint32_t)
+TEST_ALL (TEST_LOOP)
+/* { dg-final { scan-assembler-not {\tindex\tz[0-9]+\.b,} } } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, #0, #1\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, #0, #1\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, #0, #1\n} 1 } } */
+/* { dg-final { scan-assembler-not {\tcntb\t} } } */
+/* { dg-final { scan-assembler-times {\tcnth\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tcntw\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tcntd\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 1 } } */
--- a/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mixed_size_5.c
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -msve-vector-bits=512" } */
+#include <stdint.h>
+#define TEST_LOOP(TYPE1, TYPE2)						\
+  void									\
+  f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst1, TYPE1 *restrict src1,	\
+		       TYPE2 *restrict dst2, TYPE2 *restrict src2,	\
+		       int n)						\
+  {									\
+    for (int i = 0; i < n; ++i)						\
+      {									\
+	dst1[i * 2] = src1[i * 2] + 1;					\
+	dst1[i * 2 + 1] = src1[i * 2 + 1] + 1;				\
+	dst2[i * 2] = 2;						\
+	dst2[i * 2 + 1] = 3;						\
+      }									\
+  }
+#define TEST_ALL(T) \
+  T (uint16_t, uint8_t) \
+  T (uint32_t, uint16_t) \
+  T (uint32_t, _Float16) \
+  T (uint64_t, uint32_t) \
+  T (uint64_t, float)
+TEST_ALL (TEST_LOOP)
+/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 2 } } */