Commit 799d6b90 by Richard Sandiford Committed by Richard Sandiford

Improve spilling for variable-width slots

Once SVE is enabled, a general AArch64 spill slot offset will be

  A + B * VL

where A is a constant and B is a multiple of the SVE vector length.
The offsets in SVE load and store instructions are a multiple of VL
(and so can encode some values of B), while offsets for standard AArch64
load and store instructions aren't (and encode some values of A).

We therefore get better spill code if variable-sized slots are grouped
together separately from constant-sized slots, and if variable-sized
slots are not reused for constant-sized data.  Then, spills to the
constant-sized slots can add B * VL to the offset first, creating a
common anchor point for spills with the same B component but different
A components.  Similarly, spills to variable-sized slots can add A to
the offset first, creating a common anchor point for spills with the same
A component but different B components.

This patch implements the sorting and grouping side of the optimisation.
A later patch creates the anchor points.

The patch is a no-op on other targets.

2018-01-03  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* lra-spills.c (pseudo_reg_slot_compare): Sort slots by whether
	they are variable or constant sized.
	(assign_stack_slot_num_and_sort_pseudos): Don't reuse variable-sized
	slots for constant-sized data.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256208
parent 6a3c127c
...@@ -2,6 +2,15 @@ ...@@ -2,6 +2,15 @@
Alan Hayward <alan.hayward@arm.com> Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com> David Sherwood <david.sherwood@arm.com>
* lra-spills.c (pseudo_reg_slot_compare): Sort slots by whether
they are variable or constant sized.
(assign_stack_slot_num_and_sort_pseudos): Don't reuse variable-sized
slots for constant-sized data.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): When * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): When
handling COND_EXPRs with boolean comparisons, try to find a better handling COND_EXPRs with boolean comparisons, try to find a better
basis for the mask type than the boolean itself. basis for the mask type than the boolean itself.
......
...@@ -174,9 +174,17 @@ regno_freq_compare (const void *v1p, const void *v2p) ...@@ -174,9 +174,17 @@ regno_freq_compare (const void *v1p, const void *v2p)
} }
/* Sort pseudos according to their slots, putting the slots in the order /* Sort pseudos according to their slots, putting the slots in the order
that they should be allocated. Slots with lower numbers have the highest that they should be allocated.
priority and should get the smallest displacement from the stack or
frame pointer (whichever is being used). First prefer to group slots with variable sizes together and slots
with constant sizes together, since that usually makes them easier
to address from a common anchor point. E.g. loads of polynomial-sized
registers tend to take polynomial offsets while loads of constant-sized
registers tend to take constant (non-polynomial) offsets.
Next, slots with lower numbers have the highest priority and should
get the smallest displacement from the stack or frame pointer
(whichever is being used).
The first allocated slot is always closest to the frame pointer, The first allocated slot is always closest to the frame pointer,
so prefer lower slot numbers when frame_pointer_needed. If the stack so prefer lower slot numbers when frame_pointer_needed. If the stack
...@@ -194,6 +202,10 @@ pseudo_reg_slot_compare (const void *v1p, const void *v2p) ...@@ -194,6 +202,10 @@ pseudo_reg_slot_compare (const void *v1p, const void *v2p)
slot_num1 = pseudo_slots[regno1].slot_num; slot_num1 = pseudo_slots[regno1].slot_num;
slot_num2 = pseudo_slots[regno2].slot_num; slot_num2 = pseudo_slots[regno2].slot_num;
diff = (int (slots[slot_num1].size.is_constant ())
- int (slots[slot_num2].size.is_constant ()));
if (diff != 0)
return diff;
if ((diff = slot_num1 - slot_num2) != 0) if ((diff = slot_num1 - slot_num2) != 0)
return (frame_pointer_needed return (frame_pointer_needed
|| (!FRAME_GROWS_DOWNWARD) == STACK_GROWS_DOWNWARD ? diff : -diff); || (!FRAME_GROWS_DOWNWARD) == STACK_GROWS_DOWNWARD ? diff : -diff);
...@@ -356,8 +368,17 @@ assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, int n) ...@@ -356,8 +368,17 @@ assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, int n)
j = slots_num; j = slots_num;
else else
{ {
machine_mode mode
= wider_subreg_mode (PSEUDO_REGNO_MODE (regno),
lra_reg_info[regno].biggest_mode);
for (j = 0; j < slots_num; j++) for (j = 0; j < slots_num; j++)
if (slots[j].hard_regno < 0 if (slots[j].hard_regno < 0
/* Although it's possible to share slots between modes
with constant and non-constant widths, we usually
get better spill code by keeping the constant and
non-constant areas separate. */
&& (GET_MODE_SIZE (mode).is_constant ()
== slots[j].size.is_constant ())
&& ! (lra_intersected_live_ranges_p && ! (lra_intersected_live_ranges_p
(slots[j].live_ranges, (slots[j].live_ranges,
lra_reg_info[regno].live_ranges))) lra_reg_info[regno].live_ranges)))
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment