Commit c997869f by Segher Boessenkool Committed by Segher Boessenkool

shrink-wrap: Shrink-wrapping for separate components

This is the main substance of this patch series.

Instead of doing all of the prologue and epilogue in one spot, it often
is better to do components of it at different places, so that they are
executed less frequently.

What exactly is a component is completely up to the target; this code
treats it all abstractly, and uses hooks for the target to handle the
more concrete things.  Commonly there is one component for each callee-
saved register, for example.

Components can be executed more than once per function execution.  This
pass makes sure that a component's epilogue is not called more often
than the corresponding prologue has been, at any point in time; that the
prologue is called more often, wherever the prologue's effect is needed;
and that the epilogue is called as often as the prologue has been, when
the function exits.  It does this by first deciding which blocks need
which components active, and then placing prologue and epilogue
components to make that exactly true.

Deciding what blocks should run with a certain component active so that
the total cost of executing the prologues (and epilogues) is optimal, is
not a computationally feasible problem.  Instead, for each basic block,
we estimate the cost of putting a prologue right before the block, and
if that is cheaper than the total cost of putting prologues optimally
(according to the estimated cost) in the dominator subtrees strictly
dominated by this first block, place it at the first block instead.
This simple procedure places the components optimally for any dominator
sub tree where the root node's cost does not depend on anything outside
its subtree.

The cost is the execution frequency of all edges into the block coming
from blocks that do not have this component active.  The estimated cost
is the execution frequency of the block, minus the execution frequency
of any backedges (which by definition are coming from subtrees, so if
the "head" block gets a prologue, the source block of any backedge has
that component active as well).

Currently, the epilogues are placed as late as possible, given the
constraints.  This does not matter for execution cost, but we could
save a little bit of code size by placing the epilogues in a smarter
way.  This is a possible future optimisation.

Now all that is left is inserting prologues and epilogues on all edges
that jump into resp. out of the "active" set of blocks.  Often we need
to insert some components' prologues (or epilogues) on all edges into
(or out of) a block.  In theory cross-jumping can unify all such, but
in practice that often fails; besides, that is a lot of work.  So in
this case we insert the prologue and epilogue components at the "head"
or "tail" of a block, instead.

As a final optimisation, if a block needs a prologue and its immediate
dominator has the block as a post-dominator, that immediate dominator
gets the prologue as well.


	* function.c (thread_prologue_and_epilogue_insns): Call
	try_shrink_wrapping_separate.  Compute the prologue_seq afterwards,
	if it has possibly changed.  Compute the split_prologue_seq and
	epilogue_seq later, too.
	* shrink-wrap.c: #include cfgbuild.h and insn-config.h.
	(dump_components): New function.
	(struct sw): New struct.
	(SW): New function.
	(init_separate_shrink_wrap): New function.
	(fini_separate_shrink_wrap): New function.
	(place_prologue_for_one_component): New function.
	(spread_components): New function.
	(disqualify_problematic_components): New function.
	(emit_common_heads_for_components): New function.
	(emit_common_tails_for_components): New function.
	(insert_prologue_epilogue_for_components): New function.
	(try_shrink_wrapping_separate): New function.
	* shrink-wrap.h: Declare try_shrink_wrapping_separate.

From-SVN: r241063
parent e7722f11
2016-10-12 Segher Boessenkool <segher@kernel.crashing.org>
* function.c (thread_prologue_and_epilogue_insns): Call
try_shrink_wrapping_separate. Compute the prologue_seq afterwards,
if it has possibly changed. Compute the split_prologue_seq and
epilogue_seq later, too.
* shrink-wrap.c: #include cfgbuild.h and insn-config.h.
(dump_components): New function.
(struct sw): New struct.
(SW): New function.
(init_separate_shrink_wrap): New function.
(fini_separate_shrink_wrap): New function.
(place_prologue_for_one_component): New function.
(spread_components): New function.
(disqualify_problematic_components): New function.
(emit_common_heads_for_components): New function.
(emit_common_tails_for_components): New function.
(insert_prologue_epilogue_for_components): New function.
(try_shrink_wrapping_separate): New function.
* shrink-wrap.h: Declare try_shrink_wrapping_separate.
2016-10-12 Segher Boessenkool <segher@kernel.crashing.org>
* regrename.c (build_def_use): Invalidate chains that have a
REG_CFA_RESTORE on some instruction.
......
......@@ -5919,16 +5919,25 @@ thread_prologue_and_epilogue_insns (void)
edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
edge orig_entry_edge = entry_edge;
rtx_insn *split_prologue_seq = make_split_prologue_seq ();
rtx_insn *prologue_seq = make_prologue_seq ();
rtx_insn *epilogue_seq = make_epilogue_seq ();
/* Try to perform a kind of shrink-wrapping, making sure the
prologue/epilogue is emitted only around those parts of the
function that require it. */
try_shrink_wrapping (&entry_edge, prologue_seq);
/* If the target can handle splitting the prologue/epilogue into separate
components, try to shrink-wrap these components separately. */
try_shrink_wrapping_separate (entry_edge->dest);
/* If that did anything for any component we now need the generate the
"main" prologue again. If that does not work for some target then
that target should not enable separate shrink-wrapping. */
if (crtl->shrink_wrapped_separate)
prologue_seq = make_prologue_seq ();
rtx_insn *split_prologue_seq = make_split_prologue_seq ();
rtx_insn *epilogue_seq = make_epilogue_seq ();
rtl_profile_for_bb (EXIT_BLOCK_PTR_FOR_FN (cfun));
......
......@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3. If not see
/* In shrink-wrap.c. */
extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq);
extern void try_shrink_wrapping_separate (basic_block first_bb);
#define SHRINK_WRAPPING_ENABLED \
(flag_shrink_wrap && targetm.have_simple_return ())
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment