Commit c600df9a by Richard Sandiford Committed by Richard Sandiford

[AArch64] Add support for the SVE PCS

The AAPCS64 specifies that if a function takes arguments in SVE
registers or returns them in SVE registers, it must preserve all
of Z8-Z23 and all of P4-P11.  (Normal functions only preserve the
low 64 bits of Z8-Z15 and clobber all of the predicate registers.)

This variation is known informally as the "SVE PCS" and functions
that use it are known informally as "SVE functions".  The SVE PCS
is mutually interoperable with functions that follow the standard
AAPCS64 rules and those that use the aarch64_vector_pcs attribute.
(Note that it's an error to use the attribute for SVE functions.)

One complication -- although it's not really that complicated --
is that SVE registers need to be saved at a VL-dependent offset while
other registers need to be saved at a constant offset.  The easiest way
of handling this seemed to be to group the SVE registers together below
the hard frame pointer.  In common cases, the frame pointer is then
usually an easy-to-compute VL multiple above the stack pointer and a
constant amount below the incoming stack pointer.

A bigger complication is that, because the base AAPCS64 specifies that
only the low 64 bits of V8-V15 are preserved by calls, the associated
DWARF frame registers are also treated as 64 bits by the unwinder.
The 64 bits must also have the same layout as they would for a base
AAPCS64 function, otherwise unwinding won't work correctly.  (This is
actually a problem for the existing aarch64_vector_pcs support too,
but I'll fix that separately.)

This falls out naturally for little-endian targets but not for
big-endian targets.  The easiest way of meeting the requirement for them
was to use ST1D and LD1D to save and restore Z8-Z15, which also has the
nice property of storing the 64 bits at the start of the slot.  However,
using ST1D and LD1D requires a spare predicate register, and since all
of P0-P7 are either argument registers or call-preserved, we may need
to spill P4 in order to save the vector registers, even if P4 wouldn't
need to be saved otherwise.

Since Z16-Z23 are fully clobbered by base AAPCS64 functions, we don't
need to emit frame information for them at all.  This avoids having
to decide whether the registers should be treated as having 64 bits
(as for Z8-Z15), 128 bits (for Advanced SIMD) or the full SVE width.

There are two ways of dealing with stack-clash protection when
saving SVE registers:

(1) If the area between the hard frame pointer and the incoming stack
    pointer is allocated via a store with writeback (callee_adjust != 0),
    the SVE save area is allocated separately and becomes the "initial"
    allocation as far as stack-clash protection goes.  In this case
    the store with writeback acts as a probe at the hard frame pointer
    position.

(2) If the area between the hard frame pointer and the incoming stack
    pointer is allocated via aarch64_allocate_and_probe_stack_space,
    the SVE save area is added to this initial allocation, so that the
    SP ends up pointing at the SVE register saves.  It's then necessary
    to use a temporary base register to save the non-SVE registers.
    Setting up this temporary register requires a single instruction
    only and so should be more efficient than doing two allocations
    and probes.

When SVE registers need to be saved, saving them below the frame pointer
makes it harder to rely on the LR save as a stack probe, since the LR
register's offset won't usually be a compile-time constant.  The patch
copes with that by using the lowest SVE register save as a stack probe
too, and thus prevents the save from being shrink-wrapped if stack clash
protection is enabled.

The changelog describes the low-level details.

2019-10-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* calls.c (pass_by_reference): Leave the target to decide whether
	POLY_INT_CST-sized arguments should be passed by value or reference,
	rather than forcing them to be passed by reference.
	(must_pass_in_stack_var_size): Likewise.
	* config/aarch64/aarch64.md (LAST_SAVED_REGNUM): Redefine from
	V31_REGNUM to P15_REGNUM.
	* config/aarch64/aarch64-protos.h (aarch64_init_cumulative_args):
	Take an extra "silent_p" parameter, defaulting to false.
	(aarch64_sve::svbool_type_p): Declare.
	(aarch64_sve::nvectors_if_data_type): Likewise.
	* config/aarch64/aarch64.h (NUM_PR_ARG_REGS): New macro.
	(aarch64_frame::reg_offset): Turn into poly_int64s.
	(aarch64_frame::save_regs_size): Likewise.
	(aarch64_frame::below_hard_fp_saved_regs_size): New field.
	(aarch64_frame::sve_callee_adjust): Likewise.
	(aarch64_frame::spare_reg_reg): Likewise.
	(ARM_PCS_SVE): New arm_pcs value.
	(CUMULATIVE_ARGS::aapcs_nprn): New field.
	(CUMULATIVE_ARGS::aapcs_nextnprn): Likewise.
	(CUMULATIVE_ARGS::silent_p): Likewise.
	(BITS_PER_SVE_PRED): New macro.
	* config/aarch64/aarch64.c (handle_aarch64_vector_pcs_attribute): New
	function.  Reject aarch64_vector_pcs attributes on SVE functions.
	(aarch64_attribute_table): Use the above handler.
	(aarch64_sve_abi): New function.
	(aarch64_sve_argument_p): Likewise.
	(aarch64_returns_value_in_sve_regs_p): Likewise.
	(aarch64_takes_arguments_in_sve_regs_p): Likewise.
	(aarch64_fntype_abi): Check for SVE functions and return the SVE PCS
	descriptor for them.
	(aarch64_simd_decl_p): Delete.
	(aarch64_emit_cfi_for_reg_p): New function.
	(aarch64_reg_save_mode): Remove the fndecl argument and instead use
	crtl->abi to choose the mode for FP registers.  Handle the SVE PCS.
	(aarch64_hard_regno_call_part_clobbered): Do not treat FP registers
	as partly clobbered for the SVE PCS.
	(aarch64_function_ok_for_sibcall): Check whether the two functions
	use the same ABI, rather than checking specifically for whether
	they're aarch64_vector_pcs functions.
	(aarch64_pass_by_reference): Raise an error for attempts to pass
	SVE arguments when SVE is disabled.  Pass SVE arguments by reference
	if there are not enough free registers left, or if the argument is
	variadic.
	(aarch64_function_value): Handle SVE predicates, vectors and tuples.
	(aarch64_return_in_memory): Do not return SVE predicates, vectors and
	tuples in memory.
	(aarch64_layout_arg): Take a function_arg_info rather than
	individual properties.  Handle SVE predicates, vectors and tuples.
	Raise an error if they are passed to unprototyped functions.
	(aarch64_function_arg): If the silent_p flag is set, suppress the
	usual error about using float registers without TARGET_FLOAT.
	(aarch64_init_cumulative_args): Take a silent_p parameter and store
	it in the cumulative_args structure.  Initialize aapcs_nprn and
	aapcs_nextnprn.  If the silent_p flag is set, suppress the usual
	error about using float registers without TARGET_FLOAT.
	If the silent_p flag is not set, also raise an error about
	using SVE functions when SVE is disabled.
	(aarch64_function_arg_advance): Update the call to aarch64_layout_arg,
	and call it for SVE functions too.  Update aapcs_nprn similarly
	to the other register counts.
	(aarch64_layout_frame): If a big-endian function needs to save
	and restore Z8-Z15, search for a spare predicate that it can use.
	Store SVE predicates at the bottom of the register save area,
	followed by SVE vectors, then followed by the normal slots.
	Keep pointing the hard frame pointer at the base of the normal slots,
	above the SVE vectors.  Update the various frame creation and
	tear-down strategies for the new layout, initializing the new
	sve_callee_adjust field.  Add an additional layout for frames
	whose saved registers are all SVE registers.
	(aarch64_register_saved_on_entry): Cope with poly_int64 reg_offsets.
	(aarch64_return_address_signing_enabled): Likewise.
	(aarch64_push_regs, aarch64_pop_regs): Update calls to
	aarch64_reg_save_mode.
	(aarch64_adjust_sve_callee_save_base): New function.
	(aarch64_add_cfa_expression): Move earlier in file.  Take the
	saved register as an rtx rather than a register number and use
	its mode for the MEM slot.
	(aarch64_save_callee_saves): Remove the mode argument and instead
	use aarch64_reg_save_mode to get the mode of each save slot.
	Add a hard_fp_valid_p parameter.  Cope with poly_int64 register
	offsets.  Allow GP offsets to be saved at a VL-based offset from
	the stack, handling this case using the frame pointer if available
	or a temporary register otherwise.  Use ST1D to save Z8-Z15 for
	big-endian SVE functions; use normal moves for other SVE saves.
	Only mark the save as frame-related if aarch64_emit_cfi_for_reg_p
	returns true.  Add explicit CFA notes when not storing via the
	stack pointer.  Do not try to pair SVE saves.
	(aarch64_restore_callee_saves): Cope with poly_int64 register
	offsets.  Use LD1D to restore Z8-Z15 for big-endian SVE functions;
	use normal moves for other SVE restores.  Only add CFA restore notes
	if aarch64_emit_cfi_for_reg_p returns true.  Do not try to pair
	SVE restores.
	(aarch64_get_separate_components): Always keep the first SVE save
	in the prologue if we need to use it as a stack probe.  Don't allow
	Z8-Z15 saves and loads to be shrink-wrapped for big-endian targets.
	Likewise the spare predicate register that they need.  Update the
	offset calculation to account for the SVE save area.  Use the
	appropriate range check for SVE LDR and STR instructions.
	(aarch64_components_for_bb): Cope with poly_int64 reg_offsets.
	(aarch64_process_components): Likewise.  Update the offset
	calculation to account for the SVE save area.  Only mark the
	save as frame-related if aarch64_emit_cfi_for_reg_p returns true.
	Do not try to pair SVE saves.
	(aarch64_allocate_and_probe_stack_space): Cope with poly_int64
	reg_offsets.  When handling the final allocation, expect the
	first SVE register save to be part of the initial allocation
	and for it to act as a probe at SP.  Account for the SVE callee
	save area in the dump information.
	(aarch64_expand_prologue): Update the frame diagram.  Fold the
	SVE callee allocation into the initial allocation if stack clash
	protection is enabled.  Use new variables to track the offset
	of the frame chain (and hard frame pointer) from the current
	stack pointer, and likewise the offset of the bottom of the
	register save area.  Update calls to aarch64_save_callee_saves
	and aarch64_add_cfa_expression.  Apply sve_callee_adjust before
	saving the FP&SIMD registers.  Save the predicate registers.
	(aarch64_expand_epilogue): Take below_hard_fp_saved_regs_size
	into account when setting the stack pointer from the frame pointer,
	and when deciding whether we can inherit the initial adjustment
	amount from the prologue.  Restore the predicate registers after
	the vector registers, then apply sve_callee_adjust, then restore
	the general registers.
	(aarch64_secondary_reload): Don't use secondary SVE reloads
	for VNx16BImode.
	(aapcs_vfp_sub_candidate): Assert that the type is not an SVE type.
	(aarch64_short_vector_p): Return false for SVE types.
	(aarch64_vfp_is_call_or_return_candidate): Initialize *is_ha
	at the start of the function.  Return false for SVE types.
	(aarch64_asm_output_variant_pcs): Output .variant_pcs for SVE
	functions too.
	(TARGET_STRICT_ARGUMENT_NAMING): Redefine to request strict naming.
	* config/aarch64/aarch64-sve.md (*aarch64_sve_mov<mode>_le): Extend
	to big-endian targets for bytewise moves.
	(*aarch64_sve_mov<mode>_be): Exclude the bytewise case.

gcc/testsuite/
	* gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp: New file.
	* gcc.target/aarch64/sve/pcs/annotate_1.c: New test.
	* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_5.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_6.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_10.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_11_nosc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_11_sc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_9.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_5.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_6.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/nosve_8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_9.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_be_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_be_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_le_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_1_le_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_5_be.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_5_le.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_1_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/unprototyped_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_3_nosc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_3_sc.c: Likewise.
	* gcc.target/aarch64/sve/pcs/vpcs_1.c: Likewise.
	* g++.target/aarch64/sve/catch_7.C: Likewise.

From-SVN: r277564
parent 624d0f07
2019-10-29 Richard Sandiford <richard.sandiford@arm.com> 2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
* calls.c (pass_by_reference): Leave the target to decide whether
POLY_INT_CST-sized arguments should be passed by value or reference,
rather than forcing them to be passed by reference.
(must_pass_in_stack_var_size): Likewise.
* config/aarch64/aarch64.md (LAST_SAVED_REGNUM): Redefine from
V31_REGNUM to P15_REGNUM.
* config/aarch64/aarch64-protos.h (aarch64_init_cumulative_args):
Take an extra "silent_p" parameter, defaulting to false.
(aarch64_sve::svbool_type_p): Declare.
(aarch64_sve::nvectors_if_data_type): Likewise.
* config/aarch64/aarch64.h (NUM_PR_ARG_REGS): New macro.
(aarch64_frame::reg_offset): Turn into poly_int64s.
(aarch64_frame::save_regs_size): Likewise.
(aarch64_frame::below_hard_fp_saved_regs_size): New field.
(aarch64_frame::sve_callee_adjust): Likewise.
(aarch64_frame::spare_reg_reg): Likewise.
(ARM_PCS_SVE): New arm_pcs value.
(CUMULATIVE_ARGS::aapcs_nprn): New field.
(CUMULATIVE_ARGS::aapcs_nextnprn): Likewise.
(CUMULATIVE_ARGS::silent_p): Likewise.
(BITS_PER_SVE_PRED): New macro.
* config/aarch64/aarch64.c (handle_aarch64_vector_pcs_attribute): New
function. Reject aarch64_vector_pcs attributes on SVE functions.
(aarch64_attribute_table): Use the above handler.
(aarch64_sve_abi): New function.
(aarch64_sve_argument_p): Likewise.
(aarch64_returns_value_in_sve_regs_p): Likewise.
(aarch64_takes_arguments_in_sve_regs_p): Likewise.
(aarch64_fntype_abi): Check for SVE functions and return the SVE PCS
descriptor for them.
(aarch64_simd_decl_p): Delete.
(aarch64_emit_cfi_for_reg_p): New function.
(aarch64_reg_save_mode): Remove the fndecl argument and instead use
crtl->abi to choose the mode for FP registers. Handle the SVE PCS.
(aarch64_hard_regno_call_part_clobbered): Do not treat FP registers
as partly clobbered for the SVE PCS.
(aarch64_function_ok_for_sibcall): Check whether the two functions
use the same ABI, rather than checking specifically for whether
they're aarch64_vector_pcs functions.
(aarch64_pass_by_reference): Raise an error for attempts to pass
SVE arguments when SVE is disabled. Pass SVE arguments by reference
if there are not enough free registers left, or if the argument is
variadic.
(aarch64_function_value): Handle SVE predicates, vectors and tuples.
(aarch64_return_in_memory): Do not return SVE predicates, vectors and
tuples in memory.
(aarch64_layout_arg): Take a function_arg_info rather than
individual properties. Handle SVE predicates, vectors and tuples.
Raise an error if they are passed to unprototyped functions.
(aarch64_function_arg): If the silent_p flag is set, suppress the
usual error about using float registers without TARGET_FLOAT.
(aarch64_init_cumulative_args): Take a silent_p parameter and store
it in the cumulative_args structure. Initialize aapcs_nprn and
aapcs_nextnprn. If the silent_p flag is set, suppress the usual
error about using float registers without TARGET_FLOAT.
If the silent_p flag is not set, also raise an error about
using SVE functions when SVE is disabled.
(aarch64_function_arg_advance): Update the call to aarch64_layout_arg,
and call it for SVE functions too. Update aapcs_nprn similarly
to the other register counts.
(aarch64_layout_frame): If a big-endian function needs to save
and restore Z8-Z15, search for a spare predicate that it can use.
Store SVE predicates at the bottom of the register save area,
followed by SVE vectors, then followed by the normal slots.
Keep pointing the hard frame pointer at the base of the normal slots,
above the SVE vectors. Update the various frame creation and
tear-down strategies for the new layout, initializing the new
sve_callee_adjust field. Add an additional layout for frames
whose saved registers are all SVE registers.
(aarch64_register_saved_on_entry): Cope with poly_int64 reg_offsets.
(aarch64_return_address_signing_enabled): Likewise.
(aarch64_push_regs, aarch64_pop_regs): Update calls to
aarch64_reg_save_mode.
(aarch64_adjust_sve_callee_save_base): New function.
(aarch64_add_cfa_expression): Move earlier in file. Take the
saved register as an rtx rather than a register number and use
its mode for the MEM slot.
(aarch64_save_callee_saves): Remove the mode argument and instead
use aarch64_reg_save_mode to get the mode of each save slot.
Add a hard_fp_valid_p parameter. Cope with poly_int64 register
offsets. Allow GP offsets to be saved at a VL-based offset from
the stack, handling this case using the frame pointer if available
or a temporary register otherwise. Use ST1D to save Z8-Z15 for
big-endian SVE functions; use normal moves for other SVE saves.
Only mark the save as frame-related if aarch64_emit_cfi_for_reg_p
returns true. Add explicit CFA notes when not storing via the
stack pointer. Do not try to pair SVE saves.
(aarch64_restore_callee_saves): Cope with poly_int64 register
offsets. Use LD1D to restore Z8-Z15 for big-endian SVE functions;
use normal moves for other SVE restores. Only add CFA restore notes
if aarch64_emit_cfi_for_reg_p returns true. Do not try to pair
SVE restores.
(aarch64_get_separate_components): Always keep the first SVE save
in the prologue if we need to use it as a stack probe. Don't allow
Z8-Z15 saves and loads to be shrink-wrapped for big-endian targets.
Likewise the spare predicate register that they need. Update the
offset calculation to account for the SVE save area. Use the
appropriate range check for SVE LDR and STR instructions.
(aarch64_components_for_bb): Cope with poly_int64 reg_offsets.
(aarch64_process_components): Likewise. Update the offset
calculation to account for the SVE save area. Only mark the
save as frame-related if aarch64_emit_cfi_for_reg_p returns true.
Do not try to pair SVE saves.
(aarch64_allocate_and_probe_stack_space): Cope with poly_int64
reg_offsets. When handling the final allocation, expect the
first SVE register save to be part of the initial allocation
and for it to act as a probe at SP. Account for the SVE callee
save area in the dump information.
(aarch64_expand_prologue): Update the frame diagram. Fold the
SVE callee allocation into the initial allocation if stack clash
protection is enabled. Use new variables to track the offset
of the frame chain (and hard frame pointer) from the current
stack pointer, and likewise the offset of the bottom of the
register save area. Update calls to aarch64_save_callee_saves
and aarch64_add_cfa_expression. Apply sve_callee_adjust before
saving the FP&SIMD registers. Save the predicate registers.
(aarch64_expand_epilogue): Take below_hard_fp_saved_regs_size
into account when setting the stack pointer from the frame pointer,
and when deciding whether we can inherit the initial adjustment
amount from the prologue. Restore the predicate registers after
the vector registers, then apply sve_callee_adjust, then restore
the general registers.
(aarch64_secondary_reload): Don't use secondary SVE reloads
for VNx16BImode.
(aapcs_vfp_sub_candidate): Assert that the type is not an SVE type.
(aarch64_short_vector_p): Return false for SVE types.
(aarch64_vfp_is_call_or_return_candidate): Initialize *is_ha
at the start of the function. Return false for SVE types.
(aarch64_asm_output_variant_pcs): Output .variant_pcs for SVE
functions too.
(TARGET_STRICT_ARGUMENT_NAMING): Redefine to request strict naming.
* config/aarch64/aarch64-sve.md (*aarch64_sve_mov<mode>_le): Extend
to big-endian targets for bytewise moves.
(*aarch64_sve_mov<mode>_be): Exclude the bytewise case.
2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
...@@ -911,7 +911,7 @@ pass_by_reference (CUMULATIVE_ARGS *ca, function_arg_info arg) ...@@ -911,7 +911,7 @@ pass_by_reference (CUMULATIVE_ARGS *ca, function_arg_info arg)
return true; return true;
/* GCC post 3.4 passes *all* variable sized types by reference. */ /* GCC post 3.4 passes *all* variable sized types by reference. */
if (!TYPE_SIZE (type) || TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST) if (!TYPE_SIZE (type) || !poly_int_tree_p (TYPE_SIZE (type)))
return true; return true;
/* If a record type should be passed the same as its first (and only) /* If a record type should be passed the same as its first (and only)
...@@ -5878,7 +5878,7 @@ must_pass_in_stack_var_size (const function_arg_info &arg) ...@@ -5878,7 +5878,7 @@ must_pass_in_stack_var_size (const function_arg_info &arg)
return false; return false;
/* If the type has variable size... */ /* If the type has variable size... */
if (TREE_CODE (TYPE_SIZE (arg.type)) != INTEGER_CST) if (!poly_int_tree_p (TYPE_SIZE (arg.type)))
return true; return true;
/* If the type is marked as addressable (it is required /* If the type is marked as addressable (it is required
......
...@@ -617,7 +617,7 @@ void aarch64_expand_prologue (void); ...@@ -617,7 +617,7 @@ void aarch64_expand_prologue (void);
void aarch64_expand_vector_init (rtx, rtx); void aarch64_expand_vector_init (rtx, rtx);
void aarch64_sve_expand_vector_init (rtx, rtx); void aarch64_sve_expand_vector_init (rtx, rtx);
void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
const_tree, unsigned); const_tree, unsigned, bool = false);
void aarch64_init_expanders (void); void aarch64_init_expanders (void);
void aarch64_init_simd_builtins (void); void aarch64_init_simd_builtins (void);
void aarch64_emit_call_insn (rtx); void aarch64_emit_call_insn (rtx);
...@@ -705,6 +705,8 @@ namespace aarch64_sve { ...@@ -705,6 +705,8 @@ namespace aarch64_sve {
void handle_arm_sve_h (); void handle_arm_sve_h ();
tree builtin_decl (unsigned, bool); tree builtin_decl (unsigned, bool);
bool builtin_type_p (const_tree); bool builtin_type_p (const_tree);
bool svbool_type_p (const_tree);
unsigned int nvectors_if_data_type (const_tree);
const char *mangle_builtin_type (const_tree); const char *mangle_builtin_type (const_tree);
tree resolve_overloaded_builtin (location_t, unsigned int, tree resolve_overloaded_builtin (location_t, unsigned int,
vec<tree, va_gc> *); vec<tree, va_gc> *);
......
...@@ -586,14 +586,14 @@ ...@@ -586,14 +586,14 @@
} }
) )
;; Unpredicated moves (little-endian). Only allow memory operations ;; Unpredicated moves (bytes or little-endian). Only allow memory operations
;; during and after RA; before RA we want the predicated load and ;; during and after RA; before RA we want the predicated load and store
;; store patterns to be used instead. ;; patterns to be used instead.
(define_insn "*aarch64_sve_mov<mode>_le" (define_insn "*aarch64_sve_mov<mode>_le"
[(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w") [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
(match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))] (match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
"TARGET_SVE "TARGET_SVE
&& !BYTES_BIG_ENDIAN && (<MODE>mode == VNx16QImode || !BYTES_BIG_ENDIAN)
&& ((lra_in_progress || reload_completed) && ((lra_in_progress || reload_completed)
|| (register_operand (operands[0], <MODE>mode) || (register_operand (operands[0], <MODE>mode)
&& nonmemory_operand (operands[1], <MODE>mode)))" && nonmemory_operand (operands[1], <MODE>mode)))"
...@@ -604,12 +604,12 @@ ...@@ -604,12 +604,12 @@
* return aarch64_output_sve_mov_immediate (operands[1]);" * return aarch64_output_sve_mov_immediate (operands[1]);"
) )
;; Unpredicated moves (big-endian). Memory accesses require secondary ;; Unpredicated moves (non-byte big-endian). Memory accesses require secondary
;; reloads. ;; reloads.
(define_insn "*aarch64_sve_mov<mode>_be" (define_insn "*aarch64_sve_mov<mode>_be"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w, w") [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
(match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))] (match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))]
"TARGET_SVE && BYTES_BIG_ENDIAN" "TARGET_SVE && BYTES_BIG_ENDIAN && <MODE>mode != VNx16QImode"
"@ "@
mov\t%0.d, %1.d mov\t%0.d, %1.d
* return aarch64_output_sve_mov_immediate (operands[1]);" * return aarch64_output_sve_mov_immediate (operands[1]);"
......
...@@ -479,9 +479,10 @@ extern unsigned aarch64_architecture_version; ...@@ -479,9 +479,10 @@ extern unsigned aarch64_architecture_version;
#define ARG_POINTER_REGNUM AP_REGNUM #define ARG_POINTER_REGNUM AP_REGNUM
#define FIRST_PSEUDO_REGISTER (FFRT_REGNUM + 1) #define FIRST_PSEUDO_REGISTER (FFRT_REGNUM + 1)
/* The number of (integer) argument register available. */ /* The number of argument registers available for each class. */
#define NUM_ARG_REGS 8 #define NUM_ARG_REGS 8
#define NUM_FP_ARG_REGS 8 #define NUM_FP_ARG_REGS 8
#define NUM_PR_ARG_REGS 4
/* A Homogeneous Floating-Point or Short-Vector Aggregate may have at most /* A Homogeneous Floating-Point or Short-Vector Aggregate may have at most
four members. */ four members. */
...@@ -725,7 +726,7 @@ extern enum aarch64_processor aarch64_tune; ...@@ -725,7 +726,7 @@ extern enum aarch64_processor aarch64_tune;
#ifdef HAVE_POLY_INT_H #ifdef HAVE_POLY_INT_H
struct GTY (()) aarch64_frame struct GTY (()) aarch64_frame
{ {
HOST_WIDE_INT reg_offset[FIRST_PSEUDO_REGISTER]; poly_int64 reg_offset[LAST_SAVED_REGNUM + 1];
/* The number of extra stack bytes taken up by register varargs. /* The number of extra stack bytes taken up by register varargs.
This area is allocated by the callee at the very top of the This area is allocated by the callee at the very top of the
...@@ -733,9 +734,12 @@ struct GTY (()) aarch64_frame ...@@ -733,9 +734,12 @@ struct GTY (()) aarch64_frame
STACK_BOUNDARY. */ STACK_BOUNDARY. */
HOST_WIDE_INT saved_varargs_size; HOST_WIDE_INT saved_varargs_size;
/* The size of the saved callee-save int/FP registers. */ /* The size of the callee-save registers with a slot in REG_OFFSET. */
poly_int64 saved_regs_size;
HOST_WIDE_INT saved_regs_size; /* The size of the callee-save registers with a slot in REG_OFFSET that
are saved below the hard frame pointer. */
poly_int64 below_hard_fp_saved_regs_size;
/* Offset from the base of the frame (incomming SP) to the /* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of top of the locals area. This value is always a multiple of
...@@ -763,6 +767,10 @@ struct GTY (()) aarch64_frame ...@@ -763,6 +767,10 @@ struct GTY (()) aarch64_frame
It may be non-zero if no push is used (ie. callee_adjust == 0). */ It may be non-zero if no push is used (ie. callee_adjust == 0). */
poly_int64 callee_offset; poly_int64 callee_offset;
/* The size of the stack adjustment before saving or after restoring
SVE registers. */
poly_int64 sve_callee_adjust;
/* The size of the stack adjustment after saving callee-saves. */ /* The size of the stack adjustment after saving callee-saves. */
poly_int64 final_adjust; poly_int64 final_adjust;
...@@ -772,6 +780,11 @@ struct GTY (()) aarch64_frame ...@@ -772,6 +780,11 @@ struct GTY (()) aarch64_frame
unsigned wb_candidate1; unsigned wb_candidate1;
unsigned wb_candidate2; unsigned wb_candidate2;
/* Big-endian SVE frames need a spare predicate register in order
to save vector registers in the correct layout for unwinding.
This is the register they should use. */
unsigned spare_pred_reg;
bool laid_out; bool laid_out;
}; };
...@@ -800,6 +813,8 @@ enum arm_pcs ...@@ -800,6 +813,8 @@ enum arm_pcs
{ {
ARM_PCS_AAPCS64, /* Base standard AAPCS for 64 bit. */ ARM_PCS_AAPCS64, /* Base standard AAPCS for 64 bit. */
ARM_PCS_SIMD, /* For aarch64_vector_pcs functions. */ ARM_PCS_SIMD, /* For aarch64_vector_pcs functions. */
ARM_PCS_SVE, /* For functions that pass or return
values in SVE registers. */
ARM_PCS_TLSDESC, /* For targets of tlsdesc calls. */ ARM_PCS_TLSDESC, /* For targets of tlsdesc calls. */
ARM_PCS_UNKNOWN ARM_PCS_UNKNOWN
}; };
...@@ -827,6 +842,8 @@ typedef struct ...@@ -827,6 +842,8 @@ typedef struct
int aapcs_nextncrn; /* Next next core register number. */ int aapcs_nextncrn; /* Next next core register number. */
int aapcs_nvrn; /* Next Vector register number. */ int aapcs_nvrn; /* Next Vector register number. */
int aapcs_nextnvrn; /* Next Next Vector register number. */ int aapcs_nextnvrn; /* Next Next Vector register number. */
int aapcs_nprn; /* Next Predicate register number. */
int aapcs_nextnprn; /* Next Next Predicate register number. */
rtx aapcs_reg; /* Register assigned to this argument. This rtx aapcs_reg; /* Register assigned to this argument. This
is NULL_RTX if this parameter goes on is NULL_RTX if this parameter goes on
the stack. */ the stack. */
...@@ -837,6 +854,8 @@ typedef struct ...@@ -837,6 +854,8 @@ typedef struct
aapcs_reg == NULL_RTX. */ aapcs_reg == NULL_RTX. */
int aapcs_stack_size; /* The total size (in words, per 8 byte) of the int aapcs_stack_size; /* The total size (in words, per 8 byte) of the
stack arg area so far. */ stack arg area so far. */
bool silent_p; /* True if we should act silently, rather than
raise an error for invalid calls. */
} CUMULATIVE_ARGS; } CUMULATIVE_ARGS;
#endif #endif
...@@ -1144,7 +1163,8 @@ extern poly_uint16 aarch64_sve_vg; ...@@ -1144,7 +1163,8 @@ extern poly_uint16 aarch64_sve_vg;
#define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64)) #define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64))
#define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8)) #define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8))
/* The number of bytes in an SVE predicate. */ /* The number of bits and bytes in an SVE predicate. */
#define BITS_PER_SVE_PRED BYTES_PER_SVE_VECTOR
#define BYTES_PER_SVE_PRED aarch64_sve_vg #define BYTES_PER_SVE_PRED aarch64_sve_vg
/* The SVE mode for a vector of bytes. */ /* The SVE mode for a vector of bytes. */
......
...@@ -85,7 +85,6 @@ ...@@ -85,7 +85,6 @@
(V29_REGNUM 61) (V29_REGNUM 61)
(V30_REGNUM 62) (V30_REGNUM 62)
(V31_REGNUM 63) (V31_REGNUM 63)
(LAST_SAVED_REGNUM 63)
(SFP_REGNUM 64) (SFP_REGNUM 64)
(AP_REGNUM 65) (AP_REGNUM 65)
(CC_REGNUM 66) (CC_REGNUM 66)
...@@ -107,6 +106,7 @@ ...@@ -107,6 +106,7 @@
(P13_REGNUM 81) (P13_REGNUM 81)
(P14_REGNUM 82) (P14_REGNUM 82)
(P15_REGNUM 83) (P15_REGNUM 83)
(LAST_SAVED_REGNUM 83)
(FFR_REGNUM 84) (FFR_REGNUM 84)
;; "FFR token": a fake register used for representing the scheduling ;; "FFR token": a fake register used for representing the scheduling
;; restrictions on FFR-related operations. ;; restrictions on FFR-related operations.
......
2019-10-29 Richard Sandiford <richard.sandiford@arm.com> 2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp: New file.
* gcc.target/aarch64/sve/pcs/annotate_1.c: New test.
* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_5.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_6.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_7.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_10.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_11_nosc.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_11_sc.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_7.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_9.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_5.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_6.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_7.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_8.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_7.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_8.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_9.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_1_be_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_1_be_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_1_le_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_1_le_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_5_be.c: Likewise.
* gcc.target/aarch64/sve/pcs/saves_5_le.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_1_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_1_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_1_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_1_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/unprototyped_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_3_nosc.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_3_sc.c: Likewise.
* gcc.target/aarch64/sve/pcs/vpcs_1.c: Likewise.
* g++.target/aarch64/sve/catch_7.C: Likewise.
2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
......
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O" } */
#include <arm_sve.h>
void __attribute__ ((noipa))
f1 (void)
{
throw 1;
}
void __attribute__ ((noipa))
f2 (svbool_t)
{
register svint8_t z8 asm ("z8") = svindex_s8 (11, 1);
asm volatile ("" :: "w" (z8));
f1 ();
}
void __attribute__ ((noipa))
f3 (int n)
{
register double d8 asm ("v8") = 42.0;
for (int i = 0; i < n; ++i)
{
asm volatile ("" : "=w" (d8) : "w" (d8));
try { f2 (svptrue_b8 ()); } catch (int) { break; }
}
if (d8 != 42.0)
__builtin_abort ();
}
int
main (void)
{
f3 (100);
return 0;
}
# Specific regression driver for AArch64 SVE.
# Copyright (C) 2009-2019 Free Software Foundation, Inc.
# Contributed by ARM Ltd.
#
# This file is part of GCC.
#
# GCC is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3, or (at your option)
# any later version.
#
# GCC is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with GCC; see the file COPYING3. If not see
# <http://www.gnu.org/licenses/>. */
# GCC testsuite that uses the `dg.exp' driver.
# Exit immediately if this isn't an AArch64 target.
if {![istarget aarch64*-*-*] } then {
return
}
# Load support procs.
load_lib gcc-dg.exp
# If a testcase doesn't have special options, use these.
global DEFAULT_CFLAGS
if ![info exists DEFAULT_CFLAGS] then {
set DEFAULT_CFLAGS " -ansi -pedantic-errors"
}
# Initialize `dg'.
dg-init
# Force SVE if we're not testing it already.
if { [check_effective_target_aarch64_sve] } {
set sve_flags ""
} else {
set sve_flags "-march=armv8.2-a+sve"
}
# Main loop.
dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] \
$sve_flags $DEFAULT_CFLAGS
# All done.
dg-finish
/* { dg-do compile } */
#include <arm_sve.h>
svbool_t ret_b (void) { return svptrue_b8 (); }
svint8_t ret_s8 (void) { return svdup_s8 (0); }
svint16_t ret_s16 (void) { return svdup_s16 (0); }
svint32_t ret_s32 (void) { return svdup_s32 (0); }
svint64_t ret_s64 (void) { return svdup_s64 (0); }
svuint8_t ret_u8 (void) { return svdup_u8 (0); }
svuint16_t ret_u16 (void) { return svdup_u16 (0); }
svuint32_t ret_u32 (void) { return svdup_u32 (0); }
svuint64_t ret_u64 (void) { return svdup_u64 (0); }
svfloat16_t ret_f16 (void) { return svdup_f16 (0); }
svfloat32_t ret_f32 (void) { return svdup_f32 (0); }
svfloat64_t ret_f64 (void) { return svdup_f64 (0); }
svint8x2_t ret_s8x2 (void) { return svundef2_s8 (); }
svint16x2_t ret_s16x2 (void) { return svundef2_s16 (); }
svint32x2_t ret_s32x2 (void) { return svundef2_s32 (); }
svint64x2_t ret_s64x2 (void) { return svundef2_s64 (); }
svuint8x2_t ret_u8x2 (void) { return svundef2_u8 (); }
svuint16x2_t ret_u16x2 (void) { return svundef2_u16 (); }
svuint32x2_t ret_u32x2 (void) { return svundef2_u32 (); }
svuint64x2_t ret_u64x2 (void) { return svundef2_u64 (); }
svfloat16x2_t ret_f16x2 (void) { return svundef2_f16 (); }
svfloat32x2_t ret_f32x2 (void) { return svundef2_f32 (); }
svfloat64x2_t ret_f64x2 (void) { return svundef2_f64 (); }
svint8x3_t ret_s8x3 (void) { return svundef3_s8 (); }
svint16x3_t ret_s16x3 (void) { return svundef3_s16 (); }
svint32x3_t ret_s32x3 (void) { return svundef3_s32 (); }
svint64x3_t ret_s64x3 (void) { return svundef3_s64 (); }
svuint8x3_t ret_u8x3 (void) { return svundef3_u8 (); }
svuint16x3_t ret_u16x3 (void) { return svundef3_u16 (); }
svuint32x3_t ret_u32x3 (void) { return svundef3_u32 (); }
svuint64x3_t ret_u64x3 (void) { return svundef3_u64 (); }
svfloat16x3_t ret_f16x3 (void) { return svundef3_f16 (); }
svfloat32x3_t ret_f32x3 (void) { return svundef3_f32 (); }
svfloat64x3_t ret_f64x3 (void) { return svundef3_f64 (); }
svint8x4_t ret_s8x4 (void) { return svundef4_s8 (); }
svint16x4_t ret_s16x4 (void) { return svundef4_s16 (); }
svint32x4_t ret_s32x4 (void) { return svundef4_s32 (); }
svint64x4_t ret_s64x4 (void) { return svundef4_s64 (); }
svuint8x4_t ret_u8x4 (void) { return svundef4_u8 (); }
svuint16x4_t ret_u16x4 (void) { return svundef4_u16 (); }
svuint32x4_t ret_u32x4 (void) { return svundef4_u32 (); }
svuint64x4_t ret_u64x4 (void) { return svundef4_u64 (); }
svfloat16x4_t ret_f16x4 (void) { return svundef4_f16 (); }
svfloat32x4_t ret_f32x4 (void) { return svundef4_f32 (); }
svfloat64x4_t ret_f64x4 (void) { return svundef4_f64 (); }
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_b\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x4\n} } } */
/* { dg-do compile } */
#include <arm_sve.h>
void fn_b (svbool_t x) {}
void fn_s8 (svint8_t x) {}
void fn_s16 (svint16_t x) {}
void fn_s32 (svint32_t x) {}
void fn_s64 (svint64_t x) {}
void fn_u8 (svuint8_t x) {}
void fn_u16 (svuint16_t x) {}
void fn_u32 (svuint32_t x) {}
void fn_u64 (svuint64_t x) {}
void fn_f16 (svfloat16_t x) {}
void fn_f32 (svfloat32_t x) {}
void fn_f64 (svfloat64_t x) {}
void fn_s8x2 (svint8x2_t x) {}
void fn_s16x2 (svint16x2_t x) {}
void fn_s32x2 (svint32x2_t x) {}
void fn_s64x2 (svint64x2_t x) {}
void fn_u8x2 (svuint8x2_t x) {}
void fn_u16x2 (svuint16x2_t x) {}
void fn_u32x2 (svuint32x2_t x) {}
void fn_u64x2 (svuint64x2_t x) {}
void fn_f16x2 (svfloat16x2_t x) {}
void fn_f32x2 (svfloat32x2_t x) {}
void fn_f64x2 (svfloat64x2_t x) {}
void fn_s8x3 (svint8x3_t x) {}
void fn_s16x3 (svint16x3_t x) {}
void fn_s32x3 (svint32x3_t x) {}
void fn_s64x3 (svint64x3_t x) {}
void fn_u8x3 (svuint8x3_t x) {}
void fn_u16x3 (svuint16x3_t x) {}
void fn_u32x3 (svuint32x3_t x) {}
void fn_u64x3 (svuint64x3_t x) {}
void fn_f16x3 (svfloat16x3_t x) {}
void fn_f32x3 (svfloat32x3_t x) {}
void fn_f64x3 (svfloat64x3_t x) {}
void fn_s8x4 (svint8x4_t x) {}
void fn_s16x4 (svint16x4_t x) {}
void fn_s32x4 (svint32x4_t x) {}
void fn_s64x4 (svint64x4_t x) {}
void fn_u8x4 (svuint8x4_t x) {}
void fn_u16x4 (svuint16x4_t x) {}
void fn_u32x4 (svuint32x4_t x) {}
void fn_u64x4 (svuint64x4_t x) {}
void fn_f16x4 (svfloat16x4_t x) {}
void fn_f32x4 (svfloat32x4_t x) {}
void fn_f64x4 (svfloat64x4_t x) {}
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_b\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x4\n} } } */
/* { dg-do compile } */
#include <arm_sve.h>
void fn_s8 (float d0, float d1, float d2, float d3, svint8_t x) {}
void fn_s16 (float d0, float d1, float d2, float d3, svint16_t x) {}
void fn_s32 (float d0, float d1, float d2, float d3, svint32_t x) {}
void fn_s64 (float d0, float d1, float d2, float d3, svint64_t x) {}
void fn_u8 (float d0, float d1, float d2, float d3, svuint8_t x) {}
void fn_u16 (float d0, float d1, float d2, float d3, svuint16_t x) {}
void fn_u32 (float d0, float d1, float d2, float d3, svuint32_t x) {}
void fn_u64 (float d0, float d1, float d2, float d3, svuint64_t x) {}
void fn_f16 (float d0, float d1, float d2, float d3, svfloat16_t x) {}
void fn_f32 (float d0, float d1, float d2, float d3, svfloat32_t x) {}
void fn_f64 (float d0, float d1, float d2, float d3, svfloat64_t x) {}
void fn_s8x2 (float d0, float d1, float d2, float d3, svint8x2_t x) {}
void fn_s16x2 (float d0, float d1, float d2, float d3, svint16x2_t x) {}
void fn_s32x2 (float d0, float d1, float d2, float d3, svint32x2_t x) {}
void fn_s64x2 (float d0, float d1, float d2, float d3, svint64x2_t x) {}
void fn_u8x2 (float d0, float d1, float d2, float d3, svuint8x2_t x) {}
void fn_u16x2 (float d0, float d1, float d2, float d3, svuint16x2_t x) {}
void fn_u32x2 (float d0, float d1, float d2, float d3, svuint32x2_t x) {}
void fn_u64x2 (float d0, float d1, float d2, float d3, svuint64x2_t x) {}
void fn_f16x2 (float d0, float d1, float d2, float d3, svfloat16x2_t x) {}
void fn_f32x2 (float d0, float d1, float d2, float d3, svfloat32x2_t x) {}
void fn_f64x2 (float d0, float d1, float d2, float d3, svfloat64x2_t x) {}
void fn_s8x3 (float d0, float d1, float d2, float d3, svint8x3_t x) {}
void fn_s16x3 (float d0, float d1, float d2, float d3, svint16x3_t x) {}
void fn_s32x3 (float d0, float d1, float d2, float d3, svint32x3_t x) {}
void fn_s64x3 (float d0, float d1, float d2, float d3, svint64x3_t x) {}
void fn_u8x3 (float d0, float d1, float d2, float d3, svuint8x3_t x) {}
void fn_u16x3 (float d0, float d1, float d2, float d3, svuint16x3_t x) {}
void fn_u32x3 (float d0, float d1, float d2, float d3, svuint32x3_t x) {}
void fn_u64x3 (float d0, float d1, float d2, float d3, svuint64x3_t x) {}
void fn_f16x3 (float d0, float d1, float d2, float d3, svfloat16x3_t x) {}
void fn_f32x3 (float d0, float d1, float d2, float d3, svfloat32x3_t x) {}
void fn_f64x3 (float d0, float d1, float d2, float d3, svfloat64x3_t x) {}
void fn_s8x4 (float d0, float d1, float d2, float d3, svint8x4_t x) {}
void fn_s16x4 (float d0, float d1, float d2, float d3, svint16x4_t x) {}
void fn_s32x4 (float d0, float d1, float d2, float d3, svint32x4_t x) {}
void fn_s64x4 (float d0, float d1, float d2, float d3, svint64x4_t x) {}
void fn_u8x4 (float d0, float d1, float d2, float d3, svuint8x4_t x) {}
void fn_u16x4 (float d0, float d1, float d2, float d3, svuint16x4_t x) {}
void fn_u32x4 (float d0, float d1, float d2, float d3, svuint32x4_t x) {}
void fn_u64x4 (float d0, float d1, float d2, float d3, svuint64x4_t x) {}
void fn_f16x4 (float d0, float d1, float d2, float d3, svfloat16x4_t x) {}
void fn_f32x4 (float d0, float d1, float d2, float d3, svfloat32x4_t x) {}
void fn_f64x4 (float d0, float d1, float d2, float d3, svfloat64x4_t x) {}
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x4\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x4\n} } } */
/* { dg-do compile } */
#include <arm_sve.h>
void fn_s8 (float d0, float d1, float d2, float d3,
float d4, svint8_t x) {}
void fn_s16 (float d0, float d1, float d2, float d3,
float d4, svint16_t x) {}
void fn_s32 (float d0, float d1, float d2, float d3,
float d4, svint32_t x) {}
void fn_s64 (float d0, float d1, float d2, float d3,
float d4, svint64_t x) {}
void fn_u8 (float d0, float d1, float d2, float d3,
float d4, svuint8_t x) {}
void fn_u16 (float d0, float d1, float d2, float d3,
float d4, svuint16_t x) {}
void fn_u32 (float d0, float d1, float d2, float d3,
float d4, svuint32_t x) {}
void fn_u64 (float d0, float d1, float d2, float d3,
float d4, svuint64_t x) {}
void fn_f16 (float d0, float d1, float d2, float d3,
float d4, svfloat16_t x) {}
void fn_f32 (float d0, float d1, float d2, float d3,
float d4, svfloat32_t x) {}
void fn_f64 (float d0, float d1, float d2, float d3,
float d4, svfloat64_t x) {}
void fn_s8x2 (float d0, float d1, float d2, float d3,
float d4, svint8x2_t x) {}
void fn_s16x2 (float d0, float d1, float d2, float d3,
float d4, svint16x2_t x) {}
void fn_s32x2 (float d0, float d1, float d2, float d3,
float d4, svint32x2_t x) {}
void fn_s64x2 (float d0, float d1, float d2, float d3,
float d4, svint64x2_t x) {}
void fn_u8x2 (float d0, float d1, float d2, float d3,
float d4, svuint8x2_t x) {}
void fn_u16x2 (float d0, float d1, float d2, float d3,
float d4, svuint16x2_t x) {}
void fn_u32x2 (float d0, float d1, float d2, float d3,
float d4, svuint32x2_t x) {}
void fn_u64x2 (float d0, float d1, float d2, float d3,
float d4, svuint64x2_t x) {}
void fn_f16x2 (float d0, float d1, float d2, float d3,
float d4, svfloat16x2_t x) {}
void fn_f32x2 (float d0, float d1, float d2, float d3,
float d4, svfloat32x2_t x) {}
void fn_f64x2 (float d0, float d1, float d2, float d3,
float d4, svfloat64x2_t x) {}
void fn_s8x3 (float d0, float d1, float d2, float d3,
float d4, svint8x3_t x) {}
void fn_s16x3 (float d0, float d1, float d2, float d3,
float d4, svint16x3_t x) {}
void fn_s32x3 (float d0, float d1, float d2, float d3,
float d4, svint32x3_t x) {}
void fn_s64x3 (float d0, float d1, float d2, float d3,
float d4, svint64x3_t x) {}
void fn_u8x3 (float d0, float d1, float d2, float d3,
float d4, svuint8x3_t x) {}
void fn_u16x3 (float d0, float d1, float d2, float d3,
float d4, svuint16x3_t x) {}
void fn_u32x3 (float d0, float d1, float d2, float d3,
float d4, svuint32x3_t x) {}
void fn_u64x3 (float d0, float d1, float d2, float d3,
float d4, svuint64x3_t x) {}
void fn_f16x3 (float d0, float d1, float d2, float d3,
float d4, svfloat16x3_t x) {}
void fn_f32x3 (float d0, float d1, float d2, float d3,
float d4, svfloat32x3_t x) {}
void fn_f64x3 (float d0, float d1, float d2, float d3,
float d4, svfloat64x3_t x) {}
void fn_s8x4 (float d0, float d1, float d2, float d3,
float d4, svint8x4_t x) {}
void fn_s16x4 (float d0, float d1, float d2, float d3,
float d4, svint16x4_t x) {}
void fn_s32x4 (float d0, float d1, float d2, float d3,
float d4, svint32x4_t x) {}
void fn_s64x4 (float d0, float d1, float d2, float d3,
float d4, svint64x4_t x) {}
void fn_u8x4 (float d0, float d1, float d2, float d3,
float d4, svuint8x4_t x) {}
void fn_u16x4 (float d0, float d1, float d2, float d3,
float d4, svuint16x4_t x) {}
void fn_u32x4 (float d0, float d1, float d2, float d3,
float d4, svuint32x4_t x) {}
void fn_u64x4 (float d0, float d1, float d2, float d3,
float d4, svuint64x4_t x) {}
void fn_f16x4 (float d0, float d1, float d2, float d3,
float d4, svfloat16x4_t x) {}
void fn_f32x4 (float d0, float d1, float d2, float d3,
float d4, svfloat32x4_t x) {}
void fn_f64x4 (float d0, float d1, float d2, float d3,
float d4, svfloat64x4_t x) {}
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
/* { dg-do compile } */
#include <arm_sve.h>
void fn_s8 (float d0, float d1, float d2, float d3,
float d4, float d5, svint8_t x) {}
void fn_s16 (float d0, float d1, float d2, float d3,
float d4, float d5, svint16_t x) {}
void fn_s32 (float d0, float d1, float d2, float d3,
float d4, float d5, svint32_t x) {}
void fn_s64 (float d0, float d1, float d2, float d3,
float d4, float d5, svint64_t x) {}
void fn_u8 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint8_t x) {}
void fn_u16 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint16_t x) {}
void fn_u32 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint32_t x) {}
void fn_u64 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint64_t x) {}
void fn_f16 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat16_t x) {}
void fn_f32 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat32_t x) {}
void fn_f64 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat64_t x) {}
void fn_s8x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svint8x2_t x) {}
void fn_s16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svint16x2_t x) {}
void fn_s32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svint32x2_t x) {}
void fn_s64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svint64x2_t x) {}
void fn_u8x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint8x2_t x) {}
void fn_u16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint16x2_t x) {}
void fn_u32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint32x2_t x) {}
void fn_u64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint64x2_t x) {}
void fn_f16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat16x2_t x) {}
void fn_f32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat32x2_t x) {}
void fn_f64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat64x2_t x) {}
void fn_s8x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svint8x3_t x) {}
void fn_s16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svint16x3_t x) {}
void fn_s32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svint32x3_t x) {}
void fn_s64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svint64x3_t x) {}
void fn_u8x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint8x3_t x) {}
void fn_u16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint16x3_t x) {}
void fn_u32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint32x3_t x) {}
void fn_u64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint64x3_t x) {}
void fn_f16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat16x3_t x) {}
void fn_f32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat32x3_t x) {}
void fn_f64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat64x3_t x) {}
void fn_s8x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svint8x4_t x) {}
void fn_s16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svint16x4_t x) {}
void fn_s32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svint32x4_t x) {}
void fn_s64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svint64x4_t x) {}
void fn_u8x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint8x4_t x) {}
void fn_u16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint16x4_t x) {}
void fn_u32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint32x4_t x) {}
void fn_u64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svuint64x4_t x) {}
void fn_f16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat16x4_t x) {}
void fn_f32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat32x4_t x) {}
void fn_f64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, svfloat64x4_t x) {}
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
/* { dg-do compile } */
#include <arm_sve.h>
void fn_s8 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint8_t x) {}
void fn_s16 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint16_t x) {}
void fn_s32 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint32_t x) {}
void fn_s64 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint64_t x) {}
void fn_u8 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint8_t x) {}
void fn_u16 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint16_t x) {}
void fn_u32 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint32_t x) {}
void fn_u64 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint64_t x) {}
void fn_f16 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat16_t x) {}
void fn_f32 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat32_t x) {}
void fn_f64 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat64_t x) {}
void fn_s8x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint8x2_t x) {}
void fn_s16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint16x2_t x) {}
void fn_s32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint32x2_t x) {}
void fn_s64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint64x2_t x) {}
void fn_u8x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint8x2_t x) {}
void fn_u16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint16x2_t x) {}
void fn_u32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint32x2_t x) {}
void fn_u64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint64x2_t x) {}
void fn_f16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat16x2_t x) {}
void fn_f32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat32x2_t x) {}
void fn_f64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat64x2_t x) {}
void fn_s8x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint8x3_t x) {}
void fn_s16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint16x3_t x) {}
void fn_s32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint32x3_t x) {}
void fn_s64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint64x3_t x) {}
void fn_u8x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint8x3_t x) {}
void fn_u16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint16x3_t x) {}
void fn_u32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint32x3_t x) {}
void fn_u64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint64x3_t x) {}
void fn_f16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat16x3_t x) {}
void fn_f32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat32x3_t x) {}
void fn_f64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat64x3_t x) {}
void fn_s8x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint8x4_t x) {}
void fn_s16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint16x4_t x) {}
void fn_s32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint32x4_t x) {}
void fn_s64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svint64x4_t x) {}
void fn_u8x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint8x4_t x) {}
void fn_u16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint16x4_t x) {}
void fn_u32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint32x4_t x) {}
void fn_u64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svuint64x4_t x) {}
void fn_f16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat16x4_t x) {}
void fn_f32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat32x4_t x) {}
void fn_f64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, svfloat64x4_t x) {}
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x2\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x3\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
/* { dg-do compile } */
#include <arm_sve.h>
void fn_s8 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint8_t x) {}
void fn_s16 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint16_t x) {}
void fn_s32 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint32_t x) {}
void fn_s64 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint64_t x) {}
void fn_u8 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint8_t x) {}
void fn_u16 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint16_t x) {}
void fn_u32 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint32_t x) {}
void fn_u64 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint64_t x) {}
void fn_f16 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat16_t x) {}
void fn_f32 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat32_t x) {}
void fn_f64 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat64_t x) {}
void fn_s8x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint8x2_t x) {}
void fn_s16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint16x2_t x) {}
void fn_s32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint32x2_t x) {}
void fn_s64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint64x2_t x) {}
void fn_u8x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint8x2_t x) {}
void fn_u16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint16x2_t x) {}
void fn_u32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint32x2_t x) {}
void fn_u64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint64x2_t x) {}
void fn_f16x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat16x2_t x) {}
void fn_f32x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat32x2_t x) {}
void fn_f64x2 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat64x2_t x) {}
void fn_s8x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint8x3_t x) {}
void fn_s16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint16x3_t x) {}
void fn_s32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint32x3_t x) {}
void fn_s64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint64x3_t x) {}
void fn_u8x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint8x3_t x) {}
void fn_u16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint16x3_t x) {}
void fn_u32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint32x3_t x) {}
void fn_u64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint64x3_t x) {}
void fn_f16x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat16x3_t x) {}
void fn_f32x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat32x3_t x) {}
void fn_f64x3 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat64x3_t x) {}
void fn_s8x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint8x4_t x) {}
void fn_s16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint16x4_t x) {}
void fn_s32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint32x4_t x) {}
void fn_s64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svint64x4_t x) {}
void fn_u8x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint8x4_t x) {}
void fn_u16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint16x4_t x) {}
void fn_u32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint32x4_t x) {}
void fn_u64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svuint64x4_t x) {}
void fn_f16x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat16x4_t x) {}
void fn_f32x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat32x4_t x) {}
void fn_f64x4 (float d0, float d1, float d2, float d3,
float d4, float d5, float d6, float d7, svfloat64x4_t x) {}
/* { dg-final { scan-assembler-not {\t\.variant_pcs\t\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee_pred:
** ldr (p[0-9]+), \[x0\]
** ldr (p[0-9]+), \[x1\]
** brkpa (p[0-7])\.b, p0/z, p1\.b, p2\.b
** brkpb (p[0-7])\.b, \3/z, p3\.b, \1\.b
** brka p0\.b, \4/z, \2\.b
** ret
*/
__SVBool_t __attribute__((noipa))
callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3,
__SVBool_t mem0, __SVBool_t mem1)
{
p0 = svbrkpa_z (p0, p1, p2);
p0 = svbrkpb_z (p0, p3, mem0);
return svbrka_z (p0, mem1);
}
/*
** caller_pred:
** ...
** ptrue (p[0-9]+)\.b, vl5
** str \1, \[x0\]
** ...
** ptrue (p[0-9]+)\.h, vl6
** str \2, \[x1\]
** ptrue p3\.d, vl4
** ptrue p2\.s, vl3
** ptrue p1\.h, vl2
** ptrue p0\.b, vl1
** bl callee_pred
** ...
*/
__SVBool_t __attribute__((noipa))
caller_pred (void)
{
return callee_pred (svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4),
svptrue_pat_b8 (SV_VL5),
svptrue_pat_b16 (SV_VL6));
}
/* { dg-do compile } */
/* { dg-options "-O -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee:
** fadd s0, (s0, s6|s6, s0)
** ret
*/
float __attribute__((noipa))
callee (float s0, double d1, svfloat32x4_t z2, svfloat64x4_t stack1,
float s6, double d7)
{
return s0 + s6;
}
float __attribute__((noipa))
caller (float32_t *x0, float64_t *x1)
{
return callee (0.0f, 1.0,
svld4 (svptrue_b8 (), x0),
svld4 (svptrue_b8 (), x1),
6.0f, 7.0);
}
/* { dg-final { scan-assembler {\tld4w\t{z2\.s - z5\.s}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - z[0-9]+\.d}, p[0-7]/z, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tmovi\tv0\.[24]s, #0\n} } } */
/* { dg-final { scan-assembler {\tfmov\td1, #?1\.0} } } */
/* { dg-final { scan-assembler {\tfmov\ts6, #?6\.0} } } */
/* { dg-final { scan-assembler {\tfmov\td7, #?7\.0} } } */
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O0 -g" } */
#include <arm_sve.h>
void __attribute__((noipa))
callee (svbool_t p, svint8_t s8, svuint16x4_t u16, svfloat32x3_t f32,
svint64x2_t s64)
{
svbool_t pg;
pg = svptrue_b8 ();
if (svptest_any (pg, sveor_z (pg, p, svptrue_pat_b8 (SV_VL7))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, s8, svindex_s8 (1, 2))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 0), svindex_u16 (2, 3))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 1), svindex_u16 (3, 4))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 2), svindex_u16 (4, 5))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 3), svindex_u16 (5, 6))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget3 (f32, 0), svdup_f32 (1.0))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget3 (f32, 1), svdup_f32 (2.0))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget3 (f32, 2), svdup_f32 (3.0))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget2 (s64, 0), svindex_s64 (6, 7))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget2 (s64, 1), svindex_s64 (7, 8))))
__builtin_abort ();
}
int __attribute__((noipa))
main (void)
{
callee (svptrue_pat_b8 (SV_VL7),
svindex_s8 (1, 2),
svcreate4 (svindex_u16 (2, 3),
svindex_u16 (3, 4),
svindex_u16 (4, 5),
svindex_u16 (5, 6)),
svcreate3 (svdup_f32 (1.0),
svdup_f32 (2.0),
svdup_f32 (3.0)),
svcreate2 (svindex_s64 (6, 7),
svindex_s64 (7, 8)));
}
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O0 -fstack-clash-protection -g" } */
#include <arm_sve.h>
void __attribute__((noipa))
callee (svbool_t p, svint8_t s8, svuint16x4_t u16, svfloat32x3_t f32,
svint64x2_t s64)
{
svbool_t pg;
pg = svptrue_b8 ();
if (svptest_any (pg, sveor_z (pg, p, svptrue_pat_b8 (SV_VL7))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, s8, svindex_s8 (1, 2))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 0), svindex_u16 (2, 3))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 1), svindex_u16 (3, 4))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 2), svindex_u16 (4, 5))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget4 (u16, 3), svindex_u16 (5, 6))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget3 (f32, 0), svdup_f32 (1.0))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget3 (f32, 1), svdup_f32 (2.0))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget3 (f32, 2), svdup_f32 (3.0))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget2 (s64, 0), svindex_s64 (6, 7))))
__builtin_abort ();
if (svptest_any (pg, svcmpne (pg, svget2 (s64, 1), svindex_s64 (7, 8))))
__builtin_abort ();
}
int __attribute__((noipa))
main (void)
{
callee (svptrue_pat_b8 (SV_VL7),
svindex_s8 (1, 2),
svcreate4 (svindex_u16 (2, 3),
svindex_u16 (3, 4),
svindex_u16 (4, 5),
svindex_u16 (5, 6)),
svcreate3 (svdup_f32 (1.0),
svdup_f32 (2.0),
svdup_f32 (3.0)),
svcreate2 (svindex_s64 (6, 7),
svindex_s64 (7, 8)));
}
/* { dg-do compile } */
/* { dg-options "-O -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee_int:
** ptrue p3\.b, all
** ld1b (z(?:2[4-9]|3[0-1]).b), p3/z, \[x4\]
** st1b \1, p2, \[x0\]
** st1b z4\.b, p1, \[x0\]
** st1h z5\.h, p1, \[x1\]
** st1w z6\.s, p1, \[x2\]
** st1d z7\.d, p1, \[x3\]
** st1b z0\.b, p0, \[x0\]
** st1h z1\.h, p0, \[x1\]
** st1w z2\.s, p0, \[x2\]
** st1d z3\.d, p0, \[x3\]
** ret
*/
void __attribute__((noipa))
callee_int (int8_t *x0, int16_t *x1, int32_t *x2, int64_t *x3,
svint8_t z0, svint16_t z1, svint32_t z2, svint64_t z3,
svint8_t z4, svint16_t z5, svint32_t z6, svint64_t z7,
svint8_t z8,
svbool_t p0, svbool_t p1, svbool_t p2)
{
svst1 (p2, x0, z8);
svst1 (p1, x0, z4);
svst1 (p1, x1, z5);
svst1 (p1, x2, z6);
svst1 (p1, x3, z7);
svst1 (p0, x0, z0);
svst1 (p0, x1, z1);
svst1 (p0, x2, z2);
svst1 (p0, x3, z3);
}
void __attribute__((noipa))
caller_int (int8_t *x0, int16_t *x1, int32_t *x2, int64_t *x3)
{
callee_int (x0, x1, x2, x3,
svdup_s8 (0),
svdup_s16 (1),
svdup_s32 (2),
svdup_s64 (3),
svdup_s8 (4),
svdup_s16 (5),
svdup_s32 (6),
svdup_s64 (7),
svdup_s8 (8),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tmov\tz0\.b, #0\n} } } */
/* { dg-final { scan-assembler {\tmov\tz1\.h, #1\n} } } */
/* { dg-final { scan-assembler {\tmov\tz2\.s, #2\n} } } */
/* { dg-final { scan-assembler {\tmov\tz3\.d, #3\n} } } */
/* { dg-final { scan-assembler {\tmov\tz4\.b, #4\n} } } */
/* { dg-final { scan-assembler {\tmov\tz5\.h, #5\n} } } */
/* { dg-final { scan-assembler {\tmov\tz6\.s, #6\n} } } */
/* { dg-final { scan-assembler {\tmov\tz7\.d, #7\n} } } */
/* { dg-final { scan-assembler {\tmov\tx4, sp\n} } } */
/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #8\n.*\tst1b\t\1, p[0-7], \[x4\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee_uint:
** ptrue p3\.b, all
** ld1b (z(?:2[4-9]|3[0-1]).b), p3/z, \[x4\]
** st1b \1, p2, \[x0\]
** st1b z4\.b, p1, \[x0\]
** st1h z5\.h, p1, \[x1\]
** st1w z6\.s, p1, \[x2\]
** st1d z7\.d, p1, \[x3\]
** st1b z0\.b, p0, \[x0\]
** st1h z1\.h, p0, \[x1\]
** st1w z2\.s, p0, \[x2\]
** st1d z3\.d, p0, \[x3\]
** ret
*/
void __attribute__((noipa))
callee_uint (uint8_t *x0, uint16_t *x1, uint32_t *x2, uint64_t *x3,
svuint8_t z0, svuint16_t z1, svuint32_t z2, svuint64_t z3,
svuint8_t z4, svuint16_t z5, svuint32_t z6, svuint64_t z7,
svuint8_t z8,
svbool_t p0, svbool_t p1, svbool_t p2)
{
svst1 (p2, x0, z8);
svst1 (p1, x0, z4);
svst1 (p1, x1, z5);
svst1 (p1, x2, z6);
svst1 (p1, x3, z7);
svst1 (p0, x0, z0);
svst1 (p0, x1, z1);
svst1 (p0, x2, z2);
svst1 (p0, x3, z3);
}
void __attribute__((noipa))
caller_uint (uint8_t *x0, uint16_t *x1, uint32_t *x2, uint64_t *x3)
{
callee_uint (x0, x1, x2, x3,
svdup_u8 (0),
svdup_u16 (1),
svdup_u32 (2),
svdup_u64 (3),
svdup_u8 (4),
svdup_u16 (5),
svdup_u32 (6),
svdup_u64 (7),
svdup_u8 (8),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tmov\tz0\.b, #0\n} } } */
/* { dg-final { scan-assembler {\tmov\tz1\.h, #1\n} } } */
/* { dg-final { scan-assembler {\tmov\tz2\.s, #2\n} } } */
/* { dg-final { scan-assembler {\tmov\tz3\.d, #3\n} } } */
/* { dg-final { scan-assembler {\tmov\tz4\.b, #4\n} } } */
/* { dg-final { scan-assembler {\tmov\tz5\.h, #5\n} } } */
/* { dg-final { scan-assembler {\tmov\tz6\.s, #6\n} } } */
/* { dg-final { scan-assembler {\tmov\tz7\.d, #7\n} } } */
/* { dg-final { scan-assembler {\tmov\tx4, sp\n} } } */
/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #8\n.*\tst1b\t\1, p[0-7], \[x4\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee_float:
** ptrue p3\.b, all
** ld1h (z(?:2[4-9]|3[0-1]).h), p3/z, \[x4\]
** st1h \1, p2, \[x0\]
** st1h z4\.h, p1, \[x0\]
** st1h z5\.h, p1, \[x1\]
** st1w z6\.s, p1, \[x2\]
** st1d z7\.d, p1, \[x3\]
** st1h z0\.h, p0, \[x0\]
** st1h z1\.h, p0, \[x1\]
** st1w z2\.s, p0, \[x2\]
** st1d z3\.d, p0, \[x3\]
** ret
*/
void __attribute__((noipa))
callee_float (float16_t *x0, float16_t *x1, float32_t *x2, float64_t *x3,
svfloat16_t z0, svfloat16_t z1, svfloat32_t z2, svfloat64_t z3,
svfloat16_t z4, svfloat16_t z5, svfloat32_t z6, svfloat64_t z7,
svfloat16_t z8,
svbool_t p0, svbool_t p1, svbool_t p2)
{
svst1 (p2, x0, z8);
svst1 (p1, x0, z4);
svst1 (p1, x1, z5);
svst1 (p1, x2, z6);
svst1 (p1, x3, z7);
svst1 (p0, x0, z0);
svst1 (p0, x1, z1);
svst1 (p0, x2, z2);
svst1 (p0, x3, z3);
}
void __attribute__((noipa))
caller_float (float16_t *x0, float16_t *x1, float32_t *x2, float64_t *x3)
{
callee_float (x0, x1, x2, x3,
svdup_f16 (0),
svdup_f16 (1),
svdup_f32 (2),
svdup_f64 (3),
svdup_f16 (4),
svdup_f16 (5),
svdup_f32 (6),
svdup_f64 (7),
svdup_f16 (8),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tmov\tz0\.[bhsd], #0\n} } } */
/* { dg-final { scan-assembler {\tfmov\tz1\.h, #1\.0} } } */
/* { dg-final { scan-assembler {\tfmov\tz2\.s, #2\.0} } } */
/* { dg-final { scan-assembler {\tfmov\tz3\.d, #3\.0} } } */
/* { dg-final { scan-assembler {\tfmov\tz4\.h, #4\.0} } } */
/* { dg-final { scan-assembler {\tfmov\tz5\.h, #5\.0} } } */
/* { dg-final { scan-assembler {\tfmov\tz6\.s, #6\.0} } } */
/* { dg-final { scan-assembler {\tfmov\tz7\.d, #7\.0} } } */
/* { dg-final { scan-assembler {\tmov\tx4, sp\n} } } */
/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.h), #8\.0.*\tst1h\t\1, p[0-7], \[x4\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
** ld1h (z[0-9]+\.h), p4/z, \[x1\]
** st2h {\2 - \1}, p0, \[x0\]
** |
** ld1h (z[0-9]+\.h), p4/z, \[x1\]
** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
** st2h {\3 - \4}, p0, \[x0\]
** )
** st4h {z0\.h - z3\.h}, p1, \[x0\]
** st3h {z4\.h - z6\.h}, p2, \[x0\]
** st1h z7\.h, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svfloat16x4_t z0, svfloat16x3_t z4, svfloat16x2_t stack,
svfloat16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_f16 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_f16 (pg, x0, -8),
svld3_vnum_f16 (pg, x0, -3),
svld2_vnum_f16 (pg, x0, 0),
svld1_vnum_f16 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
** ld1w (z[0-9]+\.s), p4/z, \[x1\]
** st2w {\2 - \1}, p0, \[x0\]
** |
** ld1w (z[0-9]+\.s), p4/z, \[x1\]
** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
** st2w {\3 - \4}, p0, \[x0\]
** )
** st4w {z0\.s - z3\.s}, p1, \[x0\]
** st3w {z4\.s - z6\.s}, p2, \[x0\]
** st1w z7\.s, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svfloat32x4_t z0, svfloat32x3_t z4, svfloat32x2_t stack,
svfloat32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_f32 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_f32 (pg, x0, -8),
svld3_vnum_f32 (pg, x0, -3),
svld2_vnum_f32 (pg, x0, 0),
svld1_vnum_f32 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
** ld1d (z[0-9]+\.d), p4/z, \[x1\]
** st2d {\2 - \1}, p0, \[x0\]
** |
** ld1d (z[0-9]+\.d), p4/z, \[x1\]
** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
** st2d {\3 - \4}, p0, \[x0\]
** )
** st4d {z0\.d - z3\.d}, p1, \[x0\]
** st3d {z4\.d - z6\.d}, p2, \[x0\]
** st1d z7\.d, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svfloat64x4_t z0, svfloat64x3_t z4, svfloat64x2_t stack,
svfloat64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_f64 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_f64 (pg, x0, -8),
svld3_vnum_f64 (pg, x0, -3),
svld2_vnum_f64 (pg, x0, 0),
svld1_vnum_f64 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
** ld1h (z[0-9]+\.h), p4/z, \[x1\]
** st2h {\2 - \1}, p0, \[x0\]
** |
** ld1h (z[0-9]+\.h), p4/z, \[x1\]
** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
** st2h {\3 - \4}, p0, \[x0\]
** )
** st4h {z0\.h - z3\.h}, p1, \[x0\]
** st3h {z4\.h - z6\.h}, p2, \[x0\]
** st1h z7\.h, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint16x4_t z0, svint16x3_t z4, svint16x2_t stack,
svint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s16 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s16 (pg, x0, -8),
svld3_vnum_s16 (pg, x0, -3),
svld2_vnum_s16 (pg, x0, 0),
svld1_vnum_s16 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
** ld1w (z[0-9]+\.s), p4/z, \[x1\]
** st2w {\2 - \1}, p0, \[x0\]
** |
** ld1w (z[0-9]+\.s), p4/z, \[x1\]
** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
** st2w {\3 - \4}, p0, \[x0\]
** )
** st4w {z0\.s - z3\.s}, p1, \[x0\]
** st3w {z4\.s - z6\.s}, p2, \[x0\]
** st1w z7\.s, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint32x4_t z0, svint32x3_t z4, svint32x2_t stack,
svint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s32 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s32 (pg, x0, -8),
svld3_vnum_s32 (pg, x0, -3),
svld2_vnum_s32 (pg, x0, 0),
svld1_vnum_s32 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
** ld1d (z[0-9]+\.d), p4/z, \[x1\]
** st2d {\2 - \1}, p0, \[x0\]
** |
** ld1d (z[0-9]+\.d), p4/z, \[x1\]
** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
** st2d {\3 - \4}, p0, \[x0\]
** )
** st4d {z0\.d - z3\.d}, p1, \[x0\]
** st3d {z4\.d - z6\.d}, p2, \[x0\]
** st1d z7\.d, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint64x4_t z0, svint64x3_t z4, svint64x2_t stack,
svint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s64 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s64 (pg, x0, -8),
svld3_vnum_s64 (pg, x0, -3),
svld2_vnum_s64 (pg, x0, 0),
svld1_vnum_s64 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
** ld1b (z[0-9]+\.b), p4/z, \[x1\]
** st2b {\2 - \1}, p0, \[x0\]
** |
** ld1b (z[0-9]+\.b), p4/z, \[x1\]
** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
** st2b {\3 - \4}, p0, \[x0\]
** )
** st4b {z0\.b - z3\.b}, p1, \[x0\]
** st3b {z4\.b - z6\.b}, p2, \[x0\]
** st1b z7\.b, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint8x4_t z0, svint8x3_t z4, svint8x2_t stack,
svint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s8 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s8 (pg, x0, -8),
svld3_vnum_s8 (pg, x0, -3),
svld2_vnum_s8 (pg, x0, 0),
svld1_vnum_s8 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
** ld1h (z[0-9]+\.h), p4/z, \[x1\]
** st2h {\2 - \1}, p0, \[x0\]
** |
** ld1h (z[0-9]+\.h), p4/z, \[x1\]
** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
** st2h {\3 - \4}, p0, \[x0\]
** )
** st4h {z0\.h - z3\.h}, p1, \[x0\]
** st3h {z4\.h - z6\.h}, p2, \[x0\]
** st1h z7\.h, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint16x4_t z0, svuint16x3_t z4, svuint16x2_t stack,
svuint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u16 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u16 (pg, x0, -8),
svld3_vnum_u16 (pg, x0, -3),
svld2_vnum_u16 (pg, x0, 0),
svld1_vnum_u16 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
** ld1w (z[0-9]+\.s), p4/z, \[x1\]
** st2w {\2 - \1}, p0, \[x0\]
** |
** ld1w (z[0-9]+\.s), p4/z, \[x1\]
** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
** st2w {\3 - \4}, p0, \[x0\]
** )
** st4w {z0\.s - z3\.s}, p1, \[x0\]
** st3w {z4\.s - z6\.s}, p2, \[x0\]
** st1w z7\.s, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint32x4_t z0, svuint32x3_t z4, svuint32x2_t stack,
svuint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u32 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u32 (pg, x0, -8),
svld3_vnum_u32 (pg, x0, -3),
svld2_vnum_u32 (pg, x0, 0),
svld1_vnum_u32 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
** ld1d (z[0-9]+\.d), p4/z, \[x1\]
** st2d {\2 - \1}, p0, \[x0\]
** |
** ld1d (z[0-9]+\.d), p4/z, \[x1\]
** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
** st2d {\3 - \4}, p0, \[x0\]
** )
** st4d {z0\.d - z3\.d}, p1, \[x0\]
** st3d {z4\.d - z6\.d}, p2, \[x0\]
** st1d z7\.d, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint64x4_t z0, svuint64x3_t z4, svuint64x2_t stack,
svuint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u64 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u64 (pg, x0, -8),
svld3_vnum_u64 (pg, x0, -3),
svld2_vnum_u64 (pg, x0, 0),
svld1_vnum_u64 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** addvl sp, sp, #-1
** str p4, \[sp\]
** ptrue p4\.b, all
** (
** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
** ld1b (z[0-9]+\.b), p4/z, \[x1\]
** st2b {\2 - \1}, p0, \[x0\]
** |
** ld1b (z[0-9]+\.b), p4/z, \[x1\]
** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
** st2b {\3 - \4}, p0, \[x0\]
** )
** st4b {z0\.b - z3\.b}, p1, \[x0\]
** st3b {z4\.b - z6\.b}, p2, \[x0\]
** st1b z7\.b, p3, \[x0\]
** ldr p4, \[sp\]
** addvl sp, sp, #1
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint8x4_t z0, svuint8x3_t z4, svuint8x2_t stack,
svuint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u8 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u8 (pg, x0, -8),
svld3_vnum_u8 (pg, x0, -3),
svld2_vnum_u8 (pg, x0, 0),
svld1_vnum_u8 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2h {\2\.h - \1\.h}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2h {\3\.h - \4\.h}, p0, \[x0\]
** )
** st4h {z0\.h - z3\.h}, p1, \[x0\]
** st3h {z4\.h - z6\.h}, p2, \[x0\]
** st1h z7\.h, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svfloat16x4_t z0, svfloat16x3_t z4, svfloat16x2_t stack,
svfloat16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_f16 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_f16 (pg, x0, -8),
svld3_vnum_f16 (pg, x0, -3),
svld2_vnum_f16 (pg, x0, 0),
svld1_vnum_f16 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2w {\2\.s - \1\.s}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2w {\3\.s - \4\.s}, p0, \[x0\]
** )
** st4w {z0\.s - z3\.s}, p1, \[x0\]
** st3w {z4\.s - z6\.s}, p2, \[x0\]
** st1w z7\.s, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svfloat32x4_t z0, svfloat32x3_t z4, svfloat32x2_t stack,
svfloat32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_f32 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_f32 (pg, x0, -8),
svld3_vnum_f32 (pg, x0, -3),
svld2_vnum_f32 (pg, x0, 0),
svld1_vnum_f32 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2d {\2\.d - \1\.d}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2d {\3\.d - \4\.d}, p0, \[x0\]
** )
** st4d {z0\.d - z3\.d}, p1, \[x0\]
** st3d {z4\.d - z6\.d}, p2, \[x0\]
** st1d z7\.d, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svfloat64x4_t z0, svfloat64x3_t z4, svfloat64x2_t stack,
svfloat64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_f64 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_f64 (pg, x0, -8),
svld3_vnum_f64 (pg, x0, -3),
svld2_vnum_f64 (pg, x0, 0),
svld1_vnum_f64 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2h {\2\.h - \1\.h}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2h {\3\.h - \4\.h}, p0, \[x0\]
** )
** st4h {z0\.h - z3\.h}, p1, \[x0\]
** st3h {z4\.h - z6\.h}, p2, \[x0\]
** st1h z7\.h, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint16x4_t z0, svint16x3_t z4, svint16x2_t stack,
svint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s16 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s16 (pg, x0, -8),
svld3_vnum_s16 (pg, x0, -3),
svld2_vnum_s16 (pg, x0, 0),
svld1_vnum_s16 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2w {\2\.s - \1\.s}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2w {\3\.s - \4\.s}, p0, \[x0\]
** )
** st4w {z0\.s - z3\.s}, p1, \[x0\]
** st3w {z4\.s - z6\.s}, p2, \[x0\]
** st1w z7\.s, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint32x4_t z0, svint32x3_t z4, svint32x2_t stack,
svint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s32 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s32 (pg, x0, -8),
svld3_vnum_s32 (pg, x0, -3),
svld2_vnum_s32 (pg, x0, 0),
svld1_vnum_s32 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2d {\2\.d - \1\.d}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2d {\3\.d - \4\.d}, p0, \[x0\]
** )
** st4d {z0\.d - z3\.d}, p1, \[x0\]
** st3d {z4\.d - z6\.d}, p2, \[x0\]
** st1d z7\.d, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint64x4_t z0, svint64x3_t z4, svint64x2_t stack,
svint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s64 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s64 (pg, x0, -8),
svld3_vnum_s64 (pg, x0, -3),
svld2_vnum_s64 (pg, x0, 0),
svld1_vnum_s64 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2b {\2\.b - \1\.b}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2b {\3\.b - \4\.b}, p0, \[x0\]
** )
** st4b {z0\.b - z3\.b}, p1, \[x0\]
** st3b {z4\.b - z6\.b}, p2, \[x0\]
** st1b z7\.b, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svint8x4_t z0, svint8x3_t z4, svint8x2_t stack,
svint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_s8 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_s8 (pg, x0, -8),
svld3_vnum_s8 (pg, x0, -3),
svld2_vnum_s8 (pg, x0, 0),
svld1_vnum_s8 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2h {\2\.h - \1\.h}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2h {\3\.h - \4\.h}, p0, \[x0\]
** )
** st4h {z0\.h - z3\.h}, p1, \[x0\]
** st3h {z4\.h - z6\.h}, p2, \[x0\]
** st1h z7\.h, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint16x4_t z0, svuint16x3_t z4, svuint16x2_t stack,
svuint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u16 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u16 (pg, x0, -8),
svld3_vnum_u16 (pg, x0, -3),
svld2_vnum_u16 (pg, x0, 0),
svld1_vnum_u16 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2w {\2\.s - \1\.s}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2w {\3\.s - \4\.s}, p0, \[x0\]
** )
** st4w {z0\.s - z3\.s}, p1, \[x0\]
** st3w {z4\.s - z6\.s}, p2, \[x0\]
** st1w z7\.s, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint32x4_t z0, svuint32x3_t z4, svuint32x2_t stack,
svuint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u32 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u32 (pg, x0, -8),
svld3_vnum_u32 (pg, x0, -3),
svld2_vnum_u32 (pg, x0, 0),
svld1_vnum_u32 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2d {\2\.d - \1\.d}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2d {\3\.d - \4\.d}, p0, \[x0\]
** )
** st4d {z0\.d - z3\.d}, p1, \[x0\]
** st3d {z4\.d - z6\.d}, p2, \[x0\]
** st1d z7\.d, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint64x4_t z0, svuint64x3_t z4, svuint64x2_t stack,
svuint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u64 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u64 (pg, x0, -8),
svld3_vnum_u64 (pg, x0, -3),
svld2_vnum_u64 (pg, x0, 0),
svld1_vnum_u64 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee:
** (
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** ldr (z[0-9]+), \[x1\]
** st2b {\2\.b - \1\.b}, p0, \[x0\]
** |
** ldr (z[0-9]+), \[x1\]
** ldr (z[0-9]+), \[x1, #1, mul vl\]
** st2b {\3\.b - \4\.b}, p0, \[x0\]
** )
** st4b {z0\.b - z3\.b}, p1, \[x0\]
** st3b {z4\.b - z6\.b}, p2, \[x0\]
** st1b z7\.b, p3, \[x0\]
** ret
*/
void __attribute__((noipa))
callee (void *x0, svuint8x4_t z0, svuint8x3_t z4, svuint8x2_t stack,
svuint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
{
svst2 (p0, x0, stack);
svst4 (p1, x0, z0);
svst3 (p2, x0, z4);
svst1_u8 (p3, x0, z7);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee (x0,
svld4_vnum_u8 (pg, x0, -8),
svld3_vnum_u8 (pg, x0, -3),
svld2_vnum_u8 (pg, x0, 0),
svld1_vnum_u8 (pg, x0, 2),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3),
svptrue_pat_b64 (SV_VL4));
}
/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1h (z[0-9]+\.h), p3/z, \[x1, #3, mul vl\]
** ...
** st4h {z[0-9]+\.h - \1}, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z5\.h - z7\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_f16 (p0, x0, stack1);
svst2_f16 (p1, x0, z3);
svst3_f16 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1h (z[0-9]+\.h), p3/z, \[x2\]
** st1h \1, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z0\.h - z2\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_f16 (p0, x0, stack2);
svst2_f16 (p1, x0, z3);
svst3_f16 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_f16 (pg, x0, -9),
svld2_vnum_f16 (pg, x0, -2),
svld3_vnum_f16 (pg, x0, 0),
svld4_vnum_f16 (pg, x0, 8),
svld1_vnum_f16 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1w (z[0-9]+\.s), p3/z, \[x1, #3, mul vl\]
** ...
** st4w {z[0-9]+\.s - \1}, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z5\.s - z7\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_f32 (p0, x0, stack1);
svst2_f32 (p1, x0, z3);
svst3_f32 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1w (z[0-9]+\.s), p3/z, \[x2\]
** st1w \1, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z0\.s - z2\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_f32 (p0, x0, stack2);
svst2_f32 (p1, x0, z3);
svst3_f32 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_f32 (pg, x0, -9),
svld2_vnum_f32 (pg, x0, -2),
svld3_vnum_f32 (pg, x0, 0),
svld4_vnum_f32 (pg, x0, 8),
svld1_vnum_f32 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1d (z[0-9]+\.d), p3/z, \[x1, #3, mul vl\]
** ...
** st4d {z[0-9]+\.d - \1}, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z5\.d - z7\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_f64 (p0, x0, stack1);
svst2_f64 (p1, x0, z3);
svst3_f64 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1d (z[0-9]+\.d), p3/z, \[x2\]
** st1d \1, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z0\.d - z2\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_f64 (p0, x0, stack2);
svst2_f64 (p1, x0, z3);
svst3_f64 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_f64 (pg, x0, -9),
svld2_vnum_f64 (pg, x0, -2),
svld3_vnum_f64 (pg, x0, 0),
svld4_vnum_f64 (pg, x0, 8),
svld1_vnum_f64 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1h (z[0-9]+\.h), p3/z, \[x1, #3, mul vl\]
** ...
** st4h {z[0-9]+\.h - \1}, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z5\.h - z7\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
svint16x4_t stack1, svint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s16 (p0, x0, stack1);
svst2_s16 (p1, x0, z3);
svst3_s16 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1h (z[0-9]+\.h), p3/z, \[x2\]
** st1h \1, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z0\.h - z2\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
svint16x4_t stack1, svint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s16 (p0, x0, stack2);
svst2_s16 (p1, x0, z3);
svst3_s16 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s16 (pg, x0, -9),
svld2_vnum_s16 (pg, x0, -2),
svld3_vnum_s16 (pg, x0, 0),
svld4_vnum_s16 (pg, x0, 8),
svld1_vnum_s16 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1w (z[0-9]+\.s), p3/z, \[x1, #3, mul vl\]
** ...
** st4w {z[0-9]+\.s - \1\}, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z5\.s - z7\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
svint32x4_t stack1, svint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s32 (p0, x0, stack1);
svst2_s32 (p1, x0, z3);
svst3_s32 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1w (z[0-9]+\.s), p3/z, \[x2\]
** st1w \1, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z0\.s - z2\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
svint32x4_t stack1, svint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s32 (p0, x0, stack2);
svst2_s32 (p1, x0, z3);
svst3_s32 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s32 (pg, x0, -9),
svld2_vnum_s32 (pg, x0, -2),
svld3_vnum_s32 (pg, x0, 0),
svld4_vnum_s32 (pg, x0, 8),
svld1_vnum_s32 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1d (z[0-9]+\.d), p3/z, \[x1, #3, mul vl\]
** ...
** st4d {z[0-9]+\.d - \1}, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z5\.d - z7\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
svint64x4_t stack1, svint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s64 (p0, x0, stack1);
svst2_s64 (p1, x0, z3);
svst3_s64 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1d (z[0-9]+\.d), p3/z, \[x2\]
** st1d \1, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z0\.d - z2\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
svint64x4_t stack1, svint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s64 (p0, x0, stack2);
svst2_s64 (p1, x0, z3);
svst3_s64 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s64 (pg, x0, -9),
svld2_vnum_s64 (pg, x0, -2),
svld3_vnum_s64 (pg, x0, 0),
svld4_vnum_s64 (pg, x0, 8),
svld1_vnum_s64 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1b (z[0-9]+\.b), p3/z, \[x1, #3, mul vl\]
** ...
** st4b {z[0-9]+\.b - \1}, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z5\.b - z7\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
svint8x4_t stack1, svint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s8 (p0, x0, stack1);
svst2_s8 (p1, x0, z3);
svst3_s8 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1b (z[0-9]+\.b), p3/z, \[x2\]
** st1b \1, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z0\.b - z2\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
svint8x4_t stack1, svint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s8 (p0, x0, stack2);
svst2_s8 (p1, x0, z3);
svst3_s8 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s8 (pg, x0, -9),
svld2_vnum_s8 (pg, x0, -2),
svld3_vnum_s8 (pg, x0, 0),
svld4_vnum_s8 (pg, x0, 8),
svld1_vnum_s8 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1h (z[0-9]+\.h), p3/z, \[x1, #3, mul vl\]
** ...
** st4h {z[0-9]+\.h - \1}, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z5\.h - z7\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u16 (p0, x0, stack1);
svst2_u16 (p1, x0, z3);
svst3_u16 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1h (z[0-9]+\.h), p3/z, \[x2\]
** st1h \1, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z0\.h - z2\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u16 (p0, x0, stack2);
svst2_u16 (p1, x0, z3);
svst3_u16 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u16 (pg, x0, -9),
svld2_vnum_u16 (pg, x0, -2),
svld3_vnum_u16 (pg, x0, 0),
svld4_vnum_u16 (pg, x0, 8),
svld1_vnum_u16 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1w (z[0-9]+\.s), p3/z, \[x1, #3, mul vl\]
** ...
** st4w {z[0-9]+\.s - \1}, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z5\.s - z7\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u32 (p0, x0, stack1);
svst2_u32 (p1, x0, z3);
svst3_u32 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1w (z[0-9]+\.s), p3/z, \[x2\]
** st1w \1, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z0\.s - z2\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u32 (p0, x0, stack2);
svst2_u32 (p1, x0, z3);
svst3_u32 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u32 (pg, x0, -9),
svld2_vnum_u32 (pg, x0, -2),
svld3_vnum_u32 (pg, x0, 0),
svld4_vnum_u32 (pg, x0, 8),
svld1_vnum_u32 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1d (z[0-9]+\.d), p3/z, \[x1, #3, mul vl\]
** ...
** st4d {z[0-9]+\.d - \1}, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z5\.d - z7\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u64 (p0, x0, stack1);
svst2_u64 (p1, x0, z3);
svst3_u64 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1d (z[0-9]+\.d), p3/z, \[x2\]
** st1d \1, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z0\.d - z2\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u64 (p0, x0, stack2);
svst2_u64 (p1, x0, z3);
svst3_u64 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u64 (pg, x0, -9),
svld2_vnum_u64 (pg, x0, -2),
svld3_vnum_u64 (pg, x0, 0),
svld4_vnum_u64 (pg, x0, 8),
svld1_vnum_u64 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ptrue p3\.b, all
** ...
** ld1b (z[0-9]+\.b), p3/z, \[x1, #3, mul vl\]
** ...
** st4b {z[0-9]+\.b - \1}, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z5\.b - z7\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u8 (p0, x0, stack1);
svst2_u8 (p1, x0, z3);
svst3_u8 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1b (z[0-9]+\.b), p3/z, \[x2\]
** st1b \1, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z0\.b - z2\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u8 (p0, x0, stack2);
svst2_u8 (p1, x0, z3);
svst3_u8 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u8 (pg, x0, -9),
svld2_vnum_u8 (pg, x0, -2),
svld3_vnum_u8 (pg, x0, 0),
svld4_vnum_u8 (pg, x0, 8),
svld1_vnum_u8 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4h {z[0-9]+\.h - \1\.h}, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z5\.h - z7\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_f16 (p0, x0, stack1);
svst2_f16 (p1, x0, z3);
svst3_f16 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1h (z[0-9]+\.h), p3/z, \[x2\]
** st1h \1, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z0\.h - z2\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_f16 (p0, x0, stack2);
svst2_f16 (p1, x0, z3);
svst3_f16 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_f16 (pg, x0, -9),
svld2_vnum_f16 (pg, x0, -2),
svld3_vnum_f16 (pg, x0, 0),
svld4_vnum_f16 (pg, x0, 8),
svld1_vnum_f16 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4w {z[0-9]+\.s - \1\.s}, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z5\.s - z7\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_f32 (p0, x0, stack1);
svst2_f32 (p1, x0, z3);
svst3_f32 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1w (z[0-9]+\.s), p3/z, \[x2\]
** st1w \1, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z0\.s - z2\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_f32 (p0, x0, stack2);
svst2_f32 (p1, x0, z3);
svst3_f32 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_f32 (pg, x0, -9),
svld2_vnum_f32 (pg, x0, -2),
svld3_vnum_f32 (pg, x0, 0),
svld4_vnum_f32 (pg, x0, 8),
svld1_vnum_f32 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4d {z[0-9]+\.d - \1\.d}, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z5\.d - z7\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_f64 (p0, x0, stack1);
svst2_f64 (p1, x0, z3);
svst3_f64 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1d (z[0-9]+\.d), p3/z, \[x2\]
** st1d \1, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z0\.d - z2\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_f64 (p0, x0, stack2);
svst2_f64 (p1, x0, z3);
svst3_f64 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_f64 (pg, x0, -9),
svld2_vnum_f64 (pg, x0, -2),
svld3_vnum_f64 (pg, x0, 0),
svld4_vnum_f64 (pg, x0, 8),
svld1_vnum_f64 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4h {z[0-9]+\.h - \1\.h}, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z5\.h - z7\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
svint16x4_t stack1, svint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s16 (p0, x0, stack1);
svst2_s16 (p1, x0, z3);
svst3_s16 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1h (z[0-9]+\.h), p3/z, \[x2\]
** st1h \1, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z0\.h - z2\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
svint16x4_t stack1, svint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s16 (p0, x0, stack2);
svst2_s16 (p1, x0, z3);
svst3_s16 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s16 (pg, x0, -9),
svld2_vnum_s16 (pg, x0, -2),
svld3_vnum_s16 (pg, x0, 0),
svld4_vnum_s16 (pg, x0, 8),
svld1_vnum_s16 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4w {z[0-9]+\.s - \1\.s}, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z5\.s - z7\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
svint32x4_t stack1, svint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s32 (p0, x0, stack1);
svst2_s32 (p1, x0, z3);
svst3_s32 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1w (z[0-9]+\.s), p3/z, \[x2\]
** st1w \1, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z0\.s - z2\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
svint32x4_t stack1, svint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s32 (p0, x0, stack2);
svst2_s32 (p1, x0, z3);
svst3_s32 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s32 (pg, x0, -9),
svld2_vnum_s32 (pg, x0, -2),
svld3_vnum_s32 (pg, x0, 0),
svld4_vnum_s32 (pg, x0, 8),
svld1_vnum_s32 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4d {z[0-9]+\.d - \1\.d}, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z5\.d - z7\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
svint64x4_t stack1, svint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s64 (p0, x0, stack1);
svst2_s64 (p1, x0, z3);
svst3_s64 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1d (z[0-9]+\.d), p3/z, \[x2\]
** st1d \1, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z0\.d - z2\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
svint64x4_t stack1, svint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s64 (p0, x0, stack2);
svst2_s64 (p1, x0, z3);
svst3_s64 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s64 (pg, x0, -9),
svld2_vnum_s64 (pg, x0, -2),
svld3_vnum_s64 (pg, x0, 0),
svld4_vnum_s64 (pg, x0, 8),
svld1_vnum_s64 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4b {z[0-9]+\.b - \1\.b}, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z5\.b - z7\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
svint8x4_t stack1, svint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_s8 (p0, x0, stack1);
svst2_s8 (p1, x0, z3);
svst3_s8 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1b (z[0-9]+\.b), p3/z, \[x2\]
** st1b \1, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z0\.b - z2\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
svint8x4_t stack1, svint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_s8 (p0, x0, stack2);
svst2_s8 (p1, x0, z3);
svst3_s8 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_s8 (pg, x0, -9),
svld2_vnum_s8 (pg, x0, -2),
svld3_vnum_s8 (pg, x0, 0),
svld4_vnum_s8 (pg, x0, 8),
svld1_vnum_s8 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4h {z[0-9]+\.h - \1\.h}, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z5\.h - z7\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u16 (p0, x0, stack1);
svst2_u16 (p1, x0, z3);
svst3_u16 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1h (z[0-9]+\.h), p3/z, \[x2\]
** st1h \1, p0, \[x0\]
** st2h {z3\.h - z4\.h}, p1, \[x0\]
** st3h {z0\.h - z2\.h}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u16 (p0, x0, stack2);
svst2_u16 (p1, x0, z3);
svst3_u16 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u16 (pg, x0, -9),
svld2_vnum_u16 (pg, x0, -2),
svld3_vnum_u16 (pg, x0, 0),
svld4_vnum_u16 (pg, x0, 8),
svld1_vnum_u16 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4w {z[0-9]+\.s - \1\.s}, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z5\.s - z7\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u32 (p0, x0, stack1);
svst2_u32 (p1, x0, z3);
svst3_u32 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1w (z[0-9]+\.s), p3/z, \[x2\]
** st1w \1, p0, \[x0\]
** st2w {z3\.s - z4\.s}, p1, \[x0\]
** st3w {z0\.s - z2\.s}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u32 (p0, x0, stack2);
svst2_u32 (p1, x0, z3);
svst3_u32 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u32 (pg, x0, -9),
svld2_vnum_u32 (pg, x0, -2),
svld3_vnum_u32 (pg, x0, 0),
svld4_vnum_u32 (pg, x0, 8),
svld1_vnum_u32 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4d {z[0-9]+\.d - \1\.d}, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z5\.d - z7\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u64 (p0, x0, stack1);
svst2_u64 (p1, x0, z3);
svst3_u64 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1d (z[0-9]+\.d), p3/z, \[x2\]
** st1d \1, p0, \[x0\]
** st2d {z3\.d - z4\.d}, p1, \[x0\]
** st3d {z0\.d - z2\.d}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u64 (p0, x0, stack2);
svst2_u64 (p1, x0, z3);
svst3_u64 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u64 (pg, x0, -9),
svld2_vnum_u64 (pg, x0, -2),
svld3_vnum_u64 (pg, x0, 0),
svld4_vnum_u64 (pg, x0, 8),
svld1_vnum_u64 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#pragma GCC aarch64 "arm_sve.h"
/*
** callee1:
** ...
** ldr (z[0-9]+), \[x1, #3, mul vl\]
** ...
** st4b {z[0-9]+\.b - \1\.b}, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z5\.b - z7\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee1 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst4_u8 (p0, x0, stack1);
svst2_u8 (p1, x0, z3);
svst3_u8 (p2, x0, z5);
}
/*
** callee2:
** ptrue p3\.b, all
** ld1b (z[0-9]+\.b), p3/z, \[x2\]
** st1b \1, p0, \[x0\]
** st2b {z3\.b - z4\.b}, p1, \[x0\]
** st3b {z0\.b - z2\.b}, p2, \[x0\]
** ret
*/
void __attribute__((noipa))
callee2 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
svbool_t p1, svbool_t p2)
{
svst1_u8 (p0, x0, stack2);
svst2_u8 (p1, x0, z3);
svst3_u8 (p2, x0, z0);
}
void __attribute__((noipa))
caller (void *x0)
{
svbool_t pg;
pg = svptrue_b8 ();
callee1 (x0,
svld3_vnum_u8 (pg, x0, -9),
svld2_vnum_u8 (pg, x0, -2),
svld3_vnum_u8 (pg, x0, 0),
svld4_vnum_u8 (pg, x0, 8),
svld1_vnum_u8 (pg, x0, 5),
svptrue_pat_b8 (SV_VL1),
svptrue_pat_b16 (SV_VL2),
svptrue_pat_b32 (SV_VL3));
}
/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee:
** ...
** ldr (x[0-9]+), \[sp\]
** ...
** ld1b (z[0-9]+\.b), p[1-3]/z, \[\1\]
** st1b \2, p0, \[x0, x7\]
** ret
*/
void __attribute__((noipa))
callee (int8_t *x0, int x1, int x2, int x3,
int x4, int x5, svbool_t p0, int x6, int64_t x7,
svint32x4_t z0, svint32x4_t z4, svint8_t stack)
{
svst1 (p0, x0 + x7, stack);
}
void __attribute__((noipa))
caller (int8_t *x0, svbool_t p0, svint32x4_t z0, svint32x4_t z4)
{
callee (x0, 1, 2, 3, 4, 5, p0, 6, 7, z0, z4, svdup_s8 (42));
}
/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #42\n.*\tst1b\t\1, p[0-7], \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp\]\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee:
** ptrue (p[1-3])\.b, all
** ld1b (z[0-9]+\.b), \1/z, \[x4\]
** st1b \2, p0, \[x0, x7\]
** ret
*/
void __attribute__((noipa))
callee (int8_t *x0, int x1, int x2, int x3,
svint32x4_t z0, svint32x4_t z4, svint8_t stack,
int x5, svbool_t p0, int x6, int64_t x7)
{
svst1 (p0, x0 + x7, stack);
}
void __attribute__((noipa))
caller (int8_t *x0, svbool_t p0, svint32x4_t z0, svint32x4_t z4)
{
callee (x0, 1, 2, 3, z0, z4, svdup_s8 (42), 5, p0, 6, 7);
}
/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #42\n.*\tst1b\t\1, p[0-7], \[x4\]\n} } } */
/* { dg-do compile } */
/* { dg-options "-O -g" } */
/* { dg-final { check-function-bodies "**" "" } } */
#include <arm_sve.h>
/*
** callee:
** ldr (x[0-9]+), \[sp, 8\]
** ldr p0, \[\1\]
** ret
*/
svbool_t __attribute__((noipa))
callee (svint64x4_t z0, svint16x4_t z4,
svint64_t stack1, svint32_t stack2,
svint16_t stack3, svint8_t stack4,
svuint64_t stack5, svuint32_t stack6,
svuint16_t stack7, svuint8_t stack8,
svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3,
svbool_t stack9, svbool_t stack10)
{
return stack10;
}
uint64_t __attribute__((noipa))
caller (int64_t *x0, int16_t *x1, svbool_t p0)
{
svbool_t res;
res = callee (svld4 (p0, x0),
svld4 (p0, x1),
svdup_s64 (1),
svdup_s32 (2),
svdup_s16 (3),
svdup_s8 (4),
svdup_u64 (5),
svdup_u32 (6),
svdup_u16 (7),
svdup_u8 (8),
svptrue_pat_b8 (SV_VL5),
svptrue_pat_b16 (SV_VL6),
svptrue_pat_b32 (SV_VL7),
svptrue_pat_b64 (SV_VL8),
svptrue_pat_b8 (SV_MUL3),
svptrue_pat_b16 (SV_MUL3));
return svcntp_b8 (res, res);
}
/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.b, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp\]\n} } } */
/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.h, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp, 8\]\n} } } */
/* { dg-do compile } */
/* { dg-prune-output "compilation terminated" } */
#include <arm_sve.h>
#pragma GCC target "+nosve"
svbool_t return_bool ();
void
f (void)
{
return_bool (); /* { dg-error {'return_bool' requires the SVE ISA extension} } */
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment