Commit 6a70badb by Richard Sandiford Committed by Richard Sandiford

[AArch64] Set NUM_POLY_INT_COEFFS to 2

This patch switches the AArch64 port to use 2 poly_int coefficients
and updates code as necessary to keep it compiling.

One potentially-significant change is to
aarch64_hard_regno_caller_save_mode.  The old implementation
was written in a pretty conservative way: it changed the default
behaviour for single-register values, but used the default handling
for multi-register values.

I don't think that's necessary, since the interesting cases for this
macro are usually the single-register ones.  Multi-register modes take
up the whole of the constituent registers and the move patterns for all
multi-register modes should be equally good.

Using the original mode for multi-register cases stops us from using
SVE modes to spill multi-register NEON values.  This was caught by
gcc.c-torture/execute/pr47538.c.

Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
(which are all scalars), and I think it's more obvious, since if we ever
do use this for elementwise shifts of vector modes, the mask will depend
on the number of bits in each element rather than the number of bits in
the whole vector.

2018-01-11  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
	* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
	Return a poly_int64 rather than a HOST_WIDE_INT.
	(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
	rather than a HOST_WIDE_INT.
	* config/aarch64/aarch64.h (aarch64_frame): Protect with
	HAVE_POLY_INT_H rather than HOST_WIDE_INT.  Change locals_offset,
	hard_fp_offset, frame_size, initial_adjust, callee_offset and
	final_offset from HOST_WIDE_INT to poly_int64.
	* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
	to_constant when getting the number of units in an Advanced SIMD
	mode.
	(aarch64_builtin_vectorized_function): Check for a constant number
	of units.
	* config/aarch64/aarch64-simd.md (mov<mode>): Handle polynomial
	GET_MODE_SIZE.
	(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use the nunits
	attribute instead of GET_MODE_NUNITS.
	* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
	(aarch64_class_max_nregs): Use the constant_lowest_bound of the
	GET_MODE_SIZE for fixed-size registers.
	(aarch64_const_vec_all_same_in_range_p): Use const_vec_duplicate_p.
	(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
	(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
	(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
	(aarch64_print_operand, aarch64_print_address_internal)
	(aarch64_address_cost, aarch64_rtx_costs, aarch64_register_move_cost)
	(aarch64_short_vector_p, aapcs_vfp_sub_candidate)
	(aarch64_simd_attr_length_rglist, aarch64_operands_ok_for_ldpstp):
	Handle polynomial GET_MODE_SIZE.
	(aarch64_hard_regno_caller_save_mode): Likewise.  Return modes
	wider than SImode without modification.
	(tls_symbolic_operand_type): Use strip_offset instead of split_const.
	(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
	(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
	passing and returning SVE modes.
	(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
	rather than GEN_INT.
	(aarch64_emit_probe_stack_range): Take the size as a poly_int64
	rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
	(aarch64_allocate_and_probe_stack_space): Likewise.
	(aarch64_layout_frame): Cope with polynomial offsets.
	(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
	start_offset as a poly_int64 rather than a HOST_WIDE_INT.  Track
	polynomial offsets.
	(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
	(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
	poly_int64 rather than a HOST_WIDE_INT.
	(aarch64_get_separate_components, aarch64_process_components)
	(aarch64_expand_prologue, aarch64_expand_epilogue)
	(aarch64_use_return_insn_p): Handle polynomial frame offsets.
	(aarch64_anchor_offset): New function, split out from...
	(aarch64_legitimize_address): ...here.
	(aarch64_builtin_vectorization_cost): Handle polynomial
	TYPE_VECTOR_SUBPARTS.
	(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
	GET_MODE_NUNITS.
	(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
	number of elements from the PARALLEL rather than the mode.
	(aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE
	rather than GET_MODE_BITSIZE.
	(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_ext)
	(aarch64_evpc_rev, aarch64_evpc_dup, aarch64_evpc_zip)
	(aarch64_expand_vec_perm_const_1): Handle polynomial
	d->perm.length () and d->perm elements.
	(aarch64_evpc_tbl): Likewise.  Use nelt rather than GET_MODE_NUNITS.
	Apply to_constant to d->perm elements.
	(aarch64_simd_valid_immediate, aarch64_vec_fpconst_pow_of_2): Handle
	polynomial CONST_VECTOR_NUNITS.
	(aarch64_move_pointer): Take amount as a poly_int64 rather
	than an int.
	(aarch64_progress_pointer): Avoid temporary variable.
	* config/aarch64/aarch64.md (aarch64_<crc_variant>): Use
	the mode attribute instead of GET_MODE.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256533
parent f5470a77
......@@ -2,6 +2,85 @@
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
Return a poly_int64 rather than a HOST_WIDE_INT.
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* config/aarch64/aarch64.h (aarch64_frame): Protect with
HAVE_POLY_INT_H rather than HOST_WIDE_INT. Change locals_offset,
hard_fp_offset, frame_size, initial_adjust, callee_offset and
final_offset from HOST_WIDE_INT to poly_int64.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
to_constant when getting the number of units in an Advanced SIMD
mode.
(aarch64_builtin_vectorized_function): Check for a constant number
of units.
* config/aarch64/aarch64-simd.md (mov<mode>): Handle polynomial
GET_MODE_SIZE.
(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use the nunits
attribute instead of GET_MODE_NUNITS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_class_max_nregs): Use the constant_lowest_bound of the
GET_MODE_SIZE for fixed-size registers.
(aarch64_const_vec_all_same_in_range_p): Use const_vec_duplicate_p.
(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
(aarch64_print_operand, aarch64_print_address_internal)
(aarch64_address_cost, aarch64_rtx_costs, aarch64_register_move_cost)
(aarch64_short_vector_p, aapcs_vfp_sub_candidate)
(aarch64_simd_attr_length_rglist, aarch64_operands_ok_for_ldpstp):
Handle polynomial GET_MODE_SIZE.
(aarch64_hard_regno_caller_save_mode): Likewise. Return modes
wider than SImode without modification.
(tls_symbolic_operand_type): Use strip_offset instead of split_const.
(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
passing and returning SVE modes.
(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
rather than GEN_INT.
(aarch64_emit_probe_stack_range): Take the size as a poly_int64
rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_layout_frame): Cope with polynomial offsets.
(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
start_offset as a poly_int64 rather than a HOST_WIDE_INT. Track
polynomial offsets.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
poly_int64 rather than a HOST_WIDE_INT.
(aarch64_get_separate_components, aarch64_process_components)
(aarch64_expand_prologue, aarch64_expand_epilogue)
(aarch64_use_return_insn_p): Handle polynomial frame offsets.
(aarch64_anchor_offset): New function, split out from...
(aarch64_legitimize_address): ...here.
(aarch64_builtin_vectorization_cost): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
GET_MODE_NUNITS.
(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
number of elements from the PARALLEL rather than the mode.
(aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE
rather than GET_MODE_BITSIZE.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_ext)
(aarch64_evpc_rev, aarch64_evpc_dup, aarch64_evpc_zip)
(aarch64_expand_vec_perm_const_1): Handle polynomial
d->perm.length () and d->perm elements.
(aarch64_evpc_tbl): Likewise. Use nelt rather than GET_MODE_NUNITS.
Apply to_constant to d->perm elements.
(aarch64_simd_valid_immediate, aarch64_vec_fpconst_pow_of_2): Handle
polynomial CONST_VECTOR_NUNITS.
(aarch64_move_pointer): Take amount as a poly_int64 rather
than an int.
(aarch64_progress_pointer): Avoid temporary variable.
* config/aarch64/aarch64.md (aarch64_<crc_variant>): Use
the mode attribute instead of GET_MODE.
2018-01-11 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* config/aarch64/aarch64.c (aarch64_force_temporary): Assert that
x exists before using it.
(aarch64_add_constant_internal): Rename to...
......
......@@ -1077,9 +1077,9 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval,
gcc_assert (opc > 1);
if (CONST_INT_P (op[opc]))
{
aarch64_simd_lane_bounds (op[opc], 0,
GET_MODE_NUNITS (builtin_mode),
exp);
unsigned int nunits
= GET_MODE_NUNITS (builtin_mode).to_constant ();
aarch64_simd_lane_bounds (op[opc], 0, nunits, exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
op[opc] = aarch64_endian_lane_rtx (builtin_mode,
INTVAL (op[opc]));
......@@ -1092,8 +1092,9 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval,
if (CONST_INT_P (op[opc]))
{
machine_mode vmode = insn_data[icode].operand[opc - 1].mode;
aarch64_simd_lane_bounds (op[opc],
0, GET_MODE_NUNITS (vmode), exp);
unsigned int nunits
= GET_MODE_NUNITS (vmode).to_constant ();
aarch64_simd_lane_bounds (op[opc], 0, nunits, exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
op[opc] = aarch64_endian_lane_rtx (vmode, INTVAL (op[opc]));
}
......@@ -1412,16 +1413,17 @@ aarch64_builtin_vectorized_function (unsigned int fn, tree type_out,
tree type_in)
{
machine_mode in_mode, out_mode;
int in_n, out_n;
unsigned HOST_WIDE_INT in_n, out_n;
if (TREE_CODE (type_out) != VECTOR_TYPE
|| TREE_CODE (type_in) != VECTOR_TYPE)
return NULL_TREE;
out_mode = TYPE_MODE (TREE_TYPE (type_out));
out_n = TYPE_VECTOR_SUBPARTS (type_out);
in_mode = TYPE_MODE (TREE_TYPE (type_in));
in_n = TYPE_VECTOR_SUBPARTS (type_in);
if (!TYPE_VECTOR_SUBPARTS (type_out).is_constant (&out_n)
|| !TYPE_VECTOR_SUBPARTS (type_in).is_constant (&in_n))
return NULL_TREE;
#undef AARCH64_CHECK_BUILTIN_MODE
#define AARCH64_CHECK_BUILTIN_MODE(C, N) 1
......
......@@ -47,3 +47,7 @@ INT_MODE (XI, 64);
/* Quad float: 128-bit floating mode for long doubles. */
FLOAT_MODE (TF, 16, ieee_quad_format);
/* Coefficient 1 is multiplied by the number of 128-bit chunks in an
SVE vector (referred to as "VQ") minus one. */
#define NUM_POLY_INT_COEFFS 2
......@@ -333,7 +333,7 @@ enum simd_immediate_check {
extern struct tune_params aarch64_tune_params;
HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
int aarch64_get_condition_code (rtx);
bool aarch64_address_valid_for_prefetch_p (rtx, bool);
bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
......@@ -366,7 +366,7 @@ bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
bool aarch64_mov_operand_p (rtx, machine_mode);
rtx aarch64_reverse_mask (machine_mode, unsigned int);
bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
......
......@@ -31,9 +31,9 @@
normal str, so the check need not apply. */
if (GET_CODE (operands[0]) == MEM
&& !(aarch64_simd_imm_zero (operands[1], <MODE>mode)
&& ((GET_MODE_SIZE (<MODE>mode) == 16
&& ((known_eq (GET_MODE_SIZE (<MODE>mode), 16)
&& aarch64_mem_pair_operand (operands[0], DImode))
|| GET_MODE_SIZE (<MODE>mode) == 8)))
|| known_eq (GET_MODE_SIZE (<MODE>mode), 8))))
operands[1] = force_reg (<MODE>mode, operands[1]);
"
)
......@@ -5334,9 +5334,7 @@
set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (<VALLDIF:MODE>mode))
* <VSTRUCT:nregs>);
aarch64_simd_lane_bounds (operands[3], 0,
GET_MODE_NUNITS (<VALLDIF:MODE>mode),
NULL);
aarch64_simd_lane_bounds (operands[3], 0, <VALLDIF:nunits>, NULL);
emit_insn (gen_aarch64_vec_load_lanes<VSTRUCT:mode>_lane<VALLDIF:mode> (
operands[0], mem, operands[2], operands[3]));
DONE;
......
......@@ -586,7 +586,7 @@ extern enum aarch64_processor aarch64_tune;
#define DEFAULT_PCC_STRUCT_RETURN 0
#ifdef HOST_WIDE_INT
#ifdef HAVE_POLY_INT_H
struct GTY (()) aarch64_frame
{
HOST_WIDE_INT reg_offset[FIRST_PSEUDO_REGISTER];
......@@ -604,20 +604,20 @@ struct GTY (()) aarch64_frame
/* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of
STACK_BOUNDARY. */
HOST_WIDE_INT locals_offset;
poly_int64 locals_offset;
/* Offset from the base of the frame (incomming SP) to the
hard_frame_pointer. This value is always a multiple of
STACK_BOUNDARY. */
HOST_WIDE_INT hard_fp_offset;
poly_int64 hard_fp_offset;
/* The size of the frame. This value is the offset from base of the
* frame (incomming SP) to the stack_pointer. This value is always
* a multiple of STACK_BOUNDARY. */
HOST_WIDE_INT frame_size;
frame (incomming SP) to the stack_pointer. This value is always
a multiple of STACK_BOUNDARY. */
poly_int64 frame_size;
/* The size of the initial stack adjustment before saving callee-saves. */
HOST_WIDE_INT initial_adjust;
poly_int64 initial_adjust;
/* The writeback value when pushing callee-save registers.
It is zero when no push is used. */
......@@ -625,10 +625,10 @@ struct GTY (()) aarch64_frame
/* The offset from SP to the callee-save registers after initial_adjust.
It may be non-zero if no push is used (ie. callee_adjust == 0). */
HOST_WIDE_INT callee_offset;
poly_int64 callee_offset;
/* The size of the stack adjustment after saving callee-saves. */
HOST_WIDE_INT final_adjust;
poly_int64 final_adjust;
/* Store FP,LR and setup a frame pointer. */
bool emit_frame_chain;
......
......@@ -3328,7 +3328,7 @@
CRC))]
"TARGET_CRC32"
{
if (GET_MODE_BITSIZE (GET_MODE (operands[2])) >= 64)
if (GET_MODE_BITSIZE (<crc_mode>mode) >= 64)
return "<crc_variant>\\t%w0, %w1, %x2";
else
return "<crc_variant>\\t%w0, %w1, %w2";
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment