Commit 71bfb77a by Wilco Dijkstra Committed by Wilco Dijkstra

This patch optimizes the prolog and epilog code to reduce the number of…

This patch optimizes the prolog and epilog code to reduce the number of instructions and avoid multiple writes to SP.

This patch optimizes the prolog and epilog code to reduce the number of
instructions and avoid multiple writes to SP.  The key idea is that epilogs
are almost exact reverses of prologs, and thus all the decisions only need
to be taken once.  The frame layout is decided in aarch64_layout_frame()
and decisions recorded in the new aarch64_frame fields initial_adjust,
callee_adjust, callee_offset and final_adjust.

A generic frame setup consists of 5 basic steps:

1. sub sp, sp, initial_adjust
2. stp reg1, reg2, [sp, -callee_adjust]!      (push if callee_adjust != 0)
3. add fp, sp, callee_offset                  (if frame_pointer_needed)
4. stp reg3, reg4, [sp, callee_offset + N*16] (store remaining callee-saves)
5. sub sp, sp, final_adjust

The epilog reverses this, and may omit step 3 if alloca wasn't used.

    gcc/
	* config/aarch64/aarch64.h (aarch64_frame):
	Remove padding0 and hardfp_offset.  Add locals_offset,
	initial_adjust, callee_adjust, callee_offset and final_adjust.
	* config/aarch64/aarch64.c (aarch64_layout_frame):
	Remove unused padding0 and hardfp_offset initializations.
	Choose frame layout and set frame variables accordingly.
	Use INVALID_REGNUM instead of FIRST_PSEUDO_REGISTER.
	(aarch64_push_regs): Use INVALID_REGNUM, not FIRST_PSEUDO_REGISTER.
	(aarch64_pop_regs): Likewise.
	(aarch64_expand_prologue): Remove all decision code, just emit
	prolog according to frame variables.
	(aarch64_expand_epilogue): Remove all decision code, just emit
	epilog according to frame variables.
	(aarch64_initial_elimination_offset): Use offset to local/arg area.

    testsuite/
	* gcc.target/aarch64/test_frame_10.c: Fix test to check for a
	single stack adjustment, no writeback.	
	* gcc.target/aarch64/test_frame_12.c: Likewise.
	* gcc.target/aarch64/test_frame_13.c: Likewise.
	* gcc.target/aarch64/test_frame_15.c: Likewise.
	* gcc.target/aarch64/test_frame_6.c: Likewise.
	* gcc.target/aarch64/test_frame_7.c: Likewise.
	* gcc.target/aarch64/test_frame_8.c: Likewise.
	* gcc.target/aarch64/test_frame_16.c: New test.

From-SVN: r238960
parent 0f86525a
2016-08-01 Wilco Dijkstra <wdijkstr@arm.com>
* config/aarch64/aarch64.h (aarch64_frame):
Remove padding0 and hardfp_offset. Add locals_offset,
initial_adjust, callee_adjust, callee_offset and final_adjust.
* config/aarch64/aarch64.c (aarch64_layout_frame):
Remove unused padding0 and hardfp_offset initializations.
Choose frame layout and set frame variables accordingly.
Use INVALID_REGNUM instead of FIRST_PSEUDO_REGISTER.
(aarch64_push_regs): Use INVALID_REGNUM, not FIRST_PSEUDO_REGISTER.
(aarch64_pop_regs): Likewise.
(aarch64_expand_prologue): Remove all decision code, just emit
prolog according to frame variables.
(aarch64_expand_epilogue): Remove all decision code, just emit
epilog according to frame variables.
(aarch64_initial_elimination_offset): Use offset to local/arg area.
2015-08-01 H.J. Lu <hongjiu.lu@intel.com>
PR target/72748
......
......@@ -550,11 +550,14 @@ struct GTY (()) aarch64_frame
STACK_BOUNDARY. */
HOST_WIDE_INT saved_varargs_size;
/* The size of the saved callee-save int/FP registers. */
HOST_WIDE_INT saved_regs_size;
/* Padding if needed after the all the callee save registers have
been saved. */
HOST_WIDE_INT padding0;
HOST_WIDE_INT hardfp_offset; /* HARD_FRAME_POINTER_REGNUM */
/* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of
STACK_BOUNDARY. */
HOST_WIDE_INT locals_offset;
/* Offset from the base of the frame (incomming SP) to the
hard_frame_pointer. This value is always a multiple of
......@@ -564,12 +567,25 @@ struct GTY (()) aarch64_frame
/* The size of the frame. This value is the offset from base of the
* frame (incomming SP) to the stack_pointer. This value is always
* a multiple of STACK_BOUNDARY. */
HOST_WIDE_INT frame_size;
/* The size of the initial stack adjustment before saving callee-saves. */
HOST_WIDE_INT initial_adjust;
/* The writeback value when pushing callee-save registers.
It is zero when no push is used. */
HOST_WIDE_INT callee_adjust;
/* The offset from SP to the callee-save registers after initial_adjust.
It may be non-zero if no push is used (ie. callee_adjust == 0). */
HOST_WIDE_INT callee_offset;
/* The size of the stack adjustment after saving callee-saves. */
HOST_WIDE_INT final_adjust;
unsigned wb_candidate1;
unsigned wb_candidate2;
HOST_WIDE_INT frame_size;
bool laid_out;
};
......
2016-08-01 Wilco Dijkstra <wdijkstr@arm.com>
* gcc.target/aarch64/test_frame_10.c: Fix test to check for a
single stack adjustment, no writeback.
* gcc.target/aarch64/test_frame_12.c: Likewise.
* gcc.target/aarch64/test_frame_13.c: Likewise.
* gcc.target/aarch64/test_frame_15.c: Likewise.
* gcc.target/aarch64/test_frame_6.c: Likewise.
* gcc.target/aarch64/test_frame_7.c: Likewise.
* gcc.target/aarch64/test_frame_8.c: Likewise.
* gcc.target/aarch64/test_frame_16.c: New test.
2015-08-01 H.J. Lu <hongjiu.lu@intel.com>
PR target/72748
......
......@@ -4,8 +4,7 @@
* total frame size > 512.
area except outgoing <= 512
* number of callee-saved reg >= 2.
* Split stack adjustment into two subtractions.
the first subtractions could be optimized into "stp !". */
* Use a single stack adjustment, no writeback. */
/* { dg-do run } */
/* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
......@@ -15,6 +14,6 @@
t_frame_pattern_outgoing (test10, 480, "x19", 24, a[8], a[9], a[10])
t_frame_run (test10)
/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 } } */
/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" 1 } } */
/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
......@@ -13,6 +13,6 @@ t_frame_run (test12)
/* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 1 } } */
/* Check epilogue using write-back. */
/* { dg-final { scan-assembler-times "ldp\tx29, x30, \\\[sp\\\], \[0-9\]+" 3 } } */
/* Check epilogue using no write-back. */
/* { dg-final { scan-assembler-times "ldp\tx29, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
......@@ -2,8 +2,7 @@
* without outgoing.
* total frame size > 512.
* number of callee-save reg >= 2.
* split the stack adjustment into two substractions,
the second could be optimized into "stp !". */
* Use a single stack adjustment, no writeback. */
/* { dg-do run } */
/* { dg-options "-O2 --save-temps" } */
......@@ -14,4 +13,4 @@ t_frame_pattern (test13, 700, )
t_frame_run (test13)
/* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 1 } } */
/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" 2 } } */
/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp\\\]" 1 } } */
......@@ -3,8 +3,7 @@
* total frame size > 512.
area except outgoing <= 512
* number of callee-save reg >= 2.
* split the stack adjustment into two substractions,
the first could be optimized into "stp !". */
* Use a single stack adjustment, no writeback. */
/* { dg-do run } */
/* { dg-options "-O2 --save-temps" } */
......@@ -15,4 +14,4 @@ t_frame_pattern_outgoing (test15, 480, , 8, a[8])
t_frame_run (test15)
/* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 1 } } */
/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" 3 } } */
/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
/* Verify:
* with outgoing.
* single int register push.
* varargs and callee-save size >= 256
* Use 2 stack adjustments. */
/* { dg-do compile } */
/* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
#define REP8(X) X,X,X,X,X,X,X,X
#define REP64(X) REP8(REP8(X))
void outgoing (__builtin_va_list, ...);
double vararg_outgoing (int x1, ...)
{
double a1 = x1, a2 = x1 * 2, a3 = x1 * 3, a4 = x1 * 4, a5 = x1 * 5, a6 = x1 * 6;
__builtin_va_list vl;
__builtin_va_start (vl, x1);
outgoing (vl, a1, a2, a3, a4, a5, a6, REP64 (1));
__builtin_va_end (vl);
return a1 + a2 + a3 + a4 + a5 + a6;
}
/* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 2 } } */
......@@ -3,8 +3,7 @@
* without outgoing.
* total frame size > 512.
* number of callee-saved reg == 1.
* split stack adjustment into two subtractions.
the second subtraction should use "str !". */
* use a single stack adjustment, no writeback. */
/* { dg-do run } */
/* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
......@@ -14,6 +13,7 @@
t_frame_pattern (test6, 700, )
t_frame_run (test6)
/* { dg-final { scan-assembler-times "str\tx30, \\\[sp, -\[0-9\]+\\\]!" 2 } } */
/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\], \[0-9\]+" 2 } } */
/* { dg-final { scan-assembler-times "str\tx30, \\\[sp\\\]" 1 } } */
/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\]" 2 } } */
/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\]," 1 } } */
......@@ -3,8 +3,7 @@
* without outgoing.
* total frame size > 512.
* number of callee-saved reg == 2.
* split stack adjustment into two subtractions.
the second subtraction should use "stp !". */
* use a single stack adjustment, no writeback. */
/* { dg-do run } */
/* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
......@@ -14,6 +13,6 @@
t_frame_pattern (test7, 700, "x19")
t_frame_run (test7)
/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 } } */
/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" 1 } } */
/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp]" 1 } } */
/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp\\\]" 1 } } */
......@@ -12,6 +12,6 @@
t_frame_pattern_outgoing (test8, 700, , 8, a[8])
t_frame_run (test8)
/* { dg-final { scan-assembler-times "str\tx30, \\\[sp, -\[0-9\]+\\\]!" 3 } } */
/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\], \[0-9\]+" 3 } } */
/* { dg-final { scan-assembler-times "str\tx30, \\\[sp, \[0-9\]+\\\]" 1 } } */
/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp, \[0-9\]+\\\]" 1 } } */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment