Commit ebfd146a by Ira Rosen Committed by Ira Rosen

tree-vect-loop-manip.c: New file.

	* tree-vect-loop-manip.c: New file.
	* tree-vectorizer.c: Update documentation and included files.
	(vect_loop_location): Make extern.
	(rename_use_op): Move to tree-vect-loop-manip.c
	(rename_variables_in_bb, rename_variables_in_loop, 
	slpeel_update_phis_for_duplicate_loop, 
	slpeel_update_phi_nodes_for_guard1,
	slpeel_update_phi_nodes_for_guard2, slpeel_make_loop_iterate_ntimes,
	slpeel_tree_duplicate_loop_to_edge_cfg, slpeel_add_loop_guard,
	slpeel_can_duplicate_loop_p, slpeel_verify_cfg_after_peeling,
	set_prologue_iterations, slpeel_tree_peel_loop_to_edge, 
	find_loop_location): Likewise.
	(new_stmt_vec_info): Move to tree-vect-stmts.c.
	(init_stmt_vec_info_vec, free_stmt_vec_info_vec, free_stmt_vec_info,
	get_vectype_for_scalar_type, vect_is_simple_use,
	supportable_widening_operation, supportable_narrowing_operation):
	Likewise.
	(bb_in_loop_p): Move to tree-vect-loop.c.
	(new_loop_vec_info, destroy_loop_vec_info, 
	reduction_code_for_scalar_code, report_vect_op, 
	vect_is_simple_reduction, vect_is_simple_iv_evolution): Likewise.
	(vect_can_force_dr_alignment_p): Move to tree-vect-data-refs.c.
	(vect_supportable_dr_alignment): Likewise.
	* tree-vectorizer.h (tree-data-ref.h): Include.
	(vect_loop_location): Declare.
	Reorganize function declarations according to the new file structure.
	* tree-vect-loop.c: New file.
	* tree-vect-analyze.c: Remove. Move functions to tree-vect-data-refs.c, 
	tree-vect-stmts.c, tree-vect-slp.c, tree-vect-loop.c.
	* tree-vect-data-refs.c: New file.
	* tree-vect-patterns.c (timevar.h): Don't include.
	* tree-vect-stmts.c: New file.
	* tree-vect-transform.c: Remove. Move functions to tree-vect-stmts.c, 
	tree-vect-slp.c, tree-vect-loop.c.
	* Makefile.in (OBJS-common): Remove tree-vect-analyze.o and 
	tree-vect-transform.o. Add tree-vect-data-refs.o, tree-vect-stmts.o, 
	tree-vect-loop.o, tree-vect-loop-manip.o, tree-vect-slp.o.
	(tree-vect-analyze.o): Remove.
	(tree-vect-transform.o): Likewise.
	(tree-vect-data-refs.o): Add rule.
	(tree-vect-stmts.o, tree-vect-loop.o, tree-vect-loop-manip.o, 
	tree-vect-slp.o): Likewise.
	(tree-vect-patterns.o): Remove redundant dependencies.
	(tree-vectorizer.o): Likewise.
	* tree-vect-slp.c: New file.

From-SVN: r145280
parent 40a1cfba
2009-03-30 Ira Rosen <irar@il.ibm.com>
* tree-vect-loop-manip.c: New file.
* tree-vectorizer.c: Update documentation and included files.
(vect_loop_location): Make extern.
(rename_use_op): Move to tree-vect-loop-manip.c
(rename_variables_in_bb, rename_variables_in_loop,
slpeel_update_phis_for_duplicate_loop,
slpeel_update_phi_nodes_for_guard1,
slpeel_update_phi_nodes_for_guard2, slpeel_make_loop_iterate_ntimes,
slpeel_tree_duplicate_loop_to_edge_cfg, slpeel_add_loop_guard,
slpeel_can_duplicate_loop_p, slpeel_verify_cfg_after_peeling,
set_prologue_iterations, slpeel_tree_peel_loop_to_edge,
find_loop_location): Likewise.
(new_stmt_vec_info): Move to tree-vect-stmts.c.
(init_stmt_vec_info_vec, free_stmt_vec_info_vec, free_stmt_vec_info,
get_vectype_for_scalar_type, vect_is_simple_use,
supportable_widening_operation, supportable_narrowing_operation):
Likewise.
(bb_in_loop_p): Move to tree-vect-loop.c.
(new_loop_vec_info, destroy_loop_vec_info,
reduction_code_for_scalar_code, report_vect_op,
vect_is_simple_reduction, vect_is_simple_iv_evolution): Likewise.
(vect_can_force_dr_alignment_p): Move to tree-vect-data-refs.c.
(vect_supportable_dr_alignment): Likewise.
* tree-vectorizer.h (tree-data-ref.h): Include.
(vect_loop_location): Declare.
Reorganize function declarations according to the new file structure.
* tree-vect-loop.c: New file.
* tree-vect-analyze.c: Remove. Move functions to tree-vect-data-refs.c,
tree-vect-stmts.c, tree-vect-slp.c, tree-vect-loop.c.
* tree-vect-data-refs.c: New file.
* tree-vect-patterns.c (timevar.h): Don't include.
* tree-vect-stmts.c: New file.
* tree-vect-transform.c: Remove. Move functions to tree-vect-stmts.c,
tree-vect-slp.c, tree-vect-loop.c.
* Makefile.in (OBJS-common): Remove tree-vect-analyze.o and
tree-vect-transform.o. Add tree-vect-data-refs.o, tree-vect-stmts.o,
tree-vect-loop.o, tree-vect-loop-manip.o, tree-vect-slp.o.
(tree-vect-analyze.o): Remove.
(tree-vect-transform.o): Likewise.
(tree-vect-data-refs.o): Add rule.
(tree-vect-stmts.o, tree-vect-loop.o, tree-vect-loop-manip.o,
tree-vect-slp.o): Likewise.
(tree-vect-patterns.o): Remove redundant dependencies.
(tree-vectorizer.o): Likewise.
* tree-vect-slp.c: New file.
2009-03-30 Ralf Wildenhues <Ralf.Wildenhues@gmx.de> 2009-03-30 Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
* optc-gen.awk: Warn if an option flag has multiple different * optc-gen.awk: Warn if an option flag has multiple different
......
...@@ -1259,10 +1259,13 @@ OBJS-common = \ ...@@ -1259,10 +1259,13 @@ OBJS-common = \
tree-ssanames.o \ tree-ssanames.o \
tree-stdarg.o \ tree-stdarg.o \
tree-tailcall.o \ tree-tailcall.o \
tree-vect-analyze.o \
tree-vect-generic.o \ tree-vect-generic.o \
tree-vect-patterns.o \ tree-vect-patterns.o \
tree-vect-transform.o \ tree-vect-data-refs.o \
tree-vect-stmts.o \
tree-vect-loop.o \
tree-vect-loop-manip.o \
tree-vect-slp.o \
tree-vectorizer.o \ tree-vectorizer.o \
tree-vrp.o \ tree-vrp.o \
tree.o \ tree.o \
...@@ -2349,26 +2352,33 @@ graphite.o: graphite.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ ...@@ -2349,26 +2352,33 @@ graphite.o: graphite.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) domwalk.h \ $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) domwalk.h \
$(TREE_DATA_REF_H) $(SCEV_H) tree-pass.h tree-chrec.h graphite.h pointer-set.h \ $(TREE_DATA_REF_H) $(SCEV_H) tree-pass.h tree-chrec.h graphite.h pointer-set.h \
value-prof.h value-prof.h
tree-vect-analyze.o: tree-vect-analyze.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ tree-vect-loop.o: tree-vect-loop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(RECOG_H) $(BASIC_BLOCK_H) \ $(TM_H) $(GGC_H) $(TREE_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) \
$(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \ $(TREE_DUMP_H) $(CFGLOOP_H) $(EXPR_H) $(RECOG_H) $(OPTABS_H) $(TOPLEV_H) \
tree-vectorizer.h $(TREE_DATA_REF_H) $(SCEV_H) $(EXPR_H) tree-chrec.h \ tree-chrec.h $(SCEV_H) tree-vectorizer.h
$(TOPLEV_H) $(RECOG_H) tree-vect-loop-manip.o: tree-vect-loop-manip.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TM_H) $(GGC_H) $(TREE_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) \
$(TREE_FLOW_H) $(TREE_DUMP_H) $(CFGLOOP_H) $(EXPR_H) $(TOPLEV_H) $(SCEV_H) \
tree-vectorizer.h langhooks.h
tree-vect-patterns.o: tree-vect-patterns.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ tree-vect-patterns.o: tree-vect-patterns.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(GGC_H) $(TREE_H) $(TARGET_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) \ $(TM_H) $(GGC_H) $(TREE_H) $(TARGET_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) \
$(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(EXPR_H) \ $(TREE_FLOW_H) $(TREE_DUMP_H) $(CFGLOOP_H) $(EXPR_H) $(OPTABS_H) $(PARAMS_H) \
$(OPTABS_H) $(PARAMS_H) $(TREE_DATA_REF_H) tree-vectorizer.h $(RECOG_H) $(TOPLEV_H) $(TREE_DATA_REF_H) tree-vectorizer.h $(RECOG_H) $(TOPLEV_H)
tree-vect-transform.o: tree-vect-transform.c $(CONFIG_H) $(SYSTEM_H) \ tree-vect-slp.o: tree-vect-slp.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TM_H) $(GGC_H) $(OPTABS_H) $(RECOG_H) $(TREE_H) $(RTL_H) \ coretypes.h $(TM_H) $(GGC_H) $(TREE_H) $(TARGET_H) $(BASIC_BLOCK_H) \
$(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \ $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(CFGLOOP_H) \
$(TIMEVAR_H) $(CFGLOOP_H) $(TARGET_H) tree-pass.h $(EXPR_H) \ $(EXPR_H) $(RECOG_H) $(OPTABS_H) tree-vectorizer.h
tree-vectorizer.h $(TREE_DATA_REF_H) $(SCEV_H) langhooks.h $(TOPLEV_H) \ tree-vect-stmts.o: tree-vect-stmts.c $(CONFIG_H) $(SYSTEM_H) \
tree-chrec.h coretypes.h $(TM_H) $(GGC_H) $(TREE_H) $(TARGET_H) $(BASIC_BLOCK_H) \
$(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(CFGLOOP_H) \
$(EXPR_H) $(RECOG_H) $(OPTABS_H) tree-vectorizer.h langhooks.h
tree-vect-data-refs.o: tree-vect-data-refs.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TM_H) $(GGC_H) $(TREE_H) $(TARGET_H) $(BASIC_BLOCK_H) \
$(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(CFGLOOP_H) \
$(EXPR_H) $(OPTABS_H) tree-chrec.h $(SCEV_H) tree-vectorizer.h $(TOPLEV_H)
tree-vectorizer.o: tree-vectorizer.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ tree-vectorizer.o: tree-vectorizer.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(RTL_H) $(BASIC_BLOCK_H) \ $(TM_H) $(GGC_H) $(TREE_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \
$(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \ $(CFGLOOP_H) tree-pass.h tree-vectorizer.h $(TIMEVAR_H)
tree-pass.h $(EXPR_H) $(RECOG_H) tree-vectorizer.h $(TREE_DATA_REF_H) $(SCEV_H) \
$(INPUT_H) $(TARGET_H) $(CFGLAYOUT_H) $(TOPLEV_H) tree-chrec.h langhooks.h
tree-loop-linear.o: tree-loop-linear.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ tree-loop-linear.o: tree-loop-linear.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(RTL_H) $(BASIC_BLOCK_H) \ $(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(RTL_H) $(BASIC_BLOCK_H) \
$(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \ $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
......
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
/* Vectorizer Specific Loop Manipulations
Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software
Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com>
and Ira Rosen <irar@il.ibm.com>
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "ggc.h"
#include "tree.h"
#include "basic-block.h"
#include "diagnostic.h"
#include "tree-flow.h"
#include "tree-dump.h"
#include "cfgloop.h"
#include "cfglayout.h"
#include "expr.h"
#include "toplev.h"
#include "tree-scalar-evolution.h"
#include "tree-vectorizer.h"
#include "langhooks.h"
/*************************************************************************
Simple Loop Peeling Utilities
Utilities to support loop peeling for vectorization purposes.
*************************************************************************/
/* Renames the use *OP_P. */
static void
rename_use_op (use_operand_p op_p)
{
tree new_name;
if (TREE_CODE (USE_FROM_PTR (op_p)) != SSA_NAME)
return;
new_name = get_current_def (USE_FROM_PTR (op_p));
/* Something defined outside of the loop. */
if (!new_name)
return;
/* An ordinary ssa name defined in the loop. */
SET_USE (op_p, new_name);
}
/* Renames the variables in basic block BB. */
void
rename_variables_in_bb (basic_block bb)
{
gimple_stmt_iterator gsi;
gimple stmt;
use_operand_p use_p;
ssa_op_iter iter;
edge e;
edge_iterator ei;
struct loop *loop = bb->loop_father;
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
stmt = gsi_stmt (gsi);
FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
rename_use_op (use_p);
}
FOR_EACH_EDGE (e, ei, bb->succs)
{
if (!flow_bb_inside_loop_p (loop, e->dest))
continue;
for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi_stmt (gsi), e));
}
}
/* Renames variables in new generated LOOP. */
void
rename_variables_in_loop (struct loop *loop)
{
unsigned i;
basic_block *bbs;
bbs = get_loop_body (loop);
for (i = 0; i < loop->num_nodes; i++)
rename_variables_in_bb (bbs[i]);
free (bbs);
}
/* Update the PHI nodes of NEW_LOOP.
NEW_LOOP is a duplicate of ORIG_LOOP.
AFTER indicates whether NEW_LOOP executes before or after ORIG_LOOP:
AFTER is true if NEW_LOOP executes after ORIG_LOOP, and false if it
executes before it. */
static void
slpeel_update_phis_for_duplicate_loop (struct loop *orig_loop,
struct loop *new_loop, bool after)
{
tree new_ssa_name;
gimple phi_new, phi_orig;
tree def;
edge orig_loop_latch = loop_latch_edge (orig_loop);
edge orig_entry_e = loop_preheader_edge (orig_loop);
edge new_loop_exit_e = single_exit (new_loop);
edge new_loop_entry_e = loop_preheader_edge (new_loop);
edge entry_arg_e = (after ? orig_loop_latch : orig_entry_e);
gimple_stmt_iterator gsi_new, gsi_orig;
/*
step 1. For each loop-header-phi:
Add the first phi argument for the phi in NEW_LOOP
(the one associated with the entry of NEW_LOOP)
step 2. For each loop-header-phi:
Add the second phi argument for the phi in NEW_LOOP
(the one associated with the latch of NEW_LOOP)
step 3. Update the phis in the successor block of NEW_LOOP.
case 1: NEW_LOOP was placed before ORIG_LOOP:
The successor block of NEW_LOOP is the header of ORIG_LOOP.
Updating the phis in the successor block can therefore be done
along with the scanning of the loop header phis, because the
header blocks of ORIG_LOOP and NEW_LOOP have exactly the same
phi nodes, organized in the same order.
case 2: NEW_LOOP was placed after ORIG_LOOP:
The successor block of NEW_LOOP is the original exit block of
ORIG_LOOP - the phis to be updated are the loop-closed-ssa phis.
We postpone updating these phis to a later stage (when
loop guards are added).
*/
/* Scan the phis in the headers of the old and new loops
(they are organized in exactly the same order). */
for (gsi_new = gsi_start_phis (new_loop->header),
gsi_orig = gsi_start_phis (orig_loop->header);
!gsi_end_p (gsi_new) && !gsi_end_p (gsi_orig);
gsi_next (&gsi_new), gsi_next (&gsi_orig))
{
phi_new = gsi_stmt (gsi_new);
phi_orig = gsi_stmt (gsi_orig);
/* step 1. */
def = PHI_ARG_DEF_FROM_EDGE (phi_orig, entry_arg_e);
add_phi_arg (phi_new, def, new_loop_entry_e);
/* step 2. */
def = PHI_ARG_DEF_FROM_EDGE (phi_orig, orig_loop_latch);
if (TREE_CODE (def) != SSA_NAME)
continue;
new_ssa_name = get_current_def (def);
if (!new_ssa_name)
{
/* This only happens if there are no definitions
inside the loop. use the phi_result in this case. */
new_ssa_name = PHI_RESULT (phi_new);
}
/* An ordinary ssa name defined in the loop. */
add_phi_arg (phi_new, new_ssa_name, loop_latch_edge (new_loop));
/* step 3 (case 1). */
if (!after)
{
gcc_assert (new_loop_exit_e == orig_entry_e);
SET_PHI_ARG_DEF (phi_orig,
new_loop_exit_e->dest_idx,
new_ssa_name);
}
}
}
/* Update PHI nodes for a guard of the LOOP.
Input:
- LOOP, GUARD_EDGE: LOOP is a loop for which we added guard code that
controls whether LOOP is to be executed. GUARD_EDGE is the edge that
originates from the guard-bb, skips LOOP and reaches the (unique) exit
bb of LOOP. This loop-exit-bb is an empty bb with one successor.
We denote this bb NEW_MERGE_BB because before the guard code was added
it had a single predecessor (the LOOP header), and now it became a merge
point of two paths - the path that ends with the LOOP exit-edge, and
the path that ends with GUARD_EDGE.
- NEW_EXIT_BB: New basic block that is added by this function between LOOP
and NEW_MERGE_BB. It is used to place loop-closed-ssa-form exit-phis.
===> The CFG before the guard-code was added:
LOOP_header_bb:
loop_body
if (exit_loop) goto update_bb
else goto LOOP_header_bb
update_bb:
==> The CFG after the guard-code was added:
guard_bb:
if (LOOP_guard_condition) goto new_merge_bb
else goto LOOP_header_bb
LOOP_header_bb:
loop_body
if (exit_loop_condition) goto new_merge_bb
else goto LOOP_header_bb
new_merge_bb:
goto update_bb
update_bb:
==> The CFG after this function:
guard_bb:
if (LOOP_guard_condition) goto new_merge_bb
else goto LOOP_header_bb
LOOP_header_bb:
loop_body
if (exit_loop_condition) goto new_exit_bb
else goto LOOP_header_bb
new_exit_bb:
new_merge_bb:
goto update_bb
update_bb:
This function:
1. creates and updates the relevant phi nodes to account for the new
incoming edge (GUARD_EDGE) into NEW_MERGE_BB. This involves:
1.1. Create phi nodes at NEW_MERGE_BB.
1.2. Update the phi nodes at the successor of NEW_MERGE_BB (denoted
UPDATE_BB). UPDATE_BB was the exit-bb of LOOP before NEW_MERGE_BB
2. preserves loop-closed-ssa-form by creating the required phi nodes
at the exit of LOOP (i.e, in NEW_EXIT_BB).
There are two flavors to this function:
slpeel_update_phi_nodes_for_guard1:
Here the guard controls whether we enter or skip LOOP, where LOOP is a
prolog_loop (loop1 below), and the new phis created in NEW_MERGE_BB are
for variables that have phis in the loop header.
slpeel_update_phi_nodes_for_guard2:
Here the guard controls whether we enter or skip LOOP, where LOOP is an
epilog_loop (loop2 below), and the new phis created in NEW_MERGE_BB are
for variables that have phis in the loop exit.
I.E., the overall structure is:
loop1_preheader_bb:
guard1 (goto loop1/merge1_bb)
loop1
loop1_exit_bb:
guard2 (goto merge1_bb/merge2_bb)
merge1_bb
loop2
loop2_exit_bb
merge2_bb
next_bb
slpeel_update_phi_nodes_for_guard1 takes care of creating phis in
loop1_exit_bb and merge1_bb. These are entry phis (phis for the vars
that have phis in loop1->header).
slpeel_update_phi_nodes_for_guard2 takes care of creating phis in
loop2_exit_bb and merge2_bb. These are exit phis (phis for the vars
that have phis in next_bb). It also adds some of these phis to
loop1_exit_bb.
slpeel_update_phi_nodes_for_guard1 is always called before
slpeel_update_phi_nodes_for_guard2. They are both needed in order
to create correct data-flow and loop-closed-ssa-form.
Generally slpeel_update_phi_nodes_for_guard1 creates phis for variables
that change between iterations of a loop (and therefore have a phi-node
at the loop entry), whereas slpeel_update_phi_nodes_for_guard2 creates
phis for variables that are used out of the loop (and therefore have
loop-closed exit phis). Some variables may be both updated between
iterations and used after the loop. This is why in loop1_exit_bb we
may need both entry_phis (created by slpeel_update_phi_nodes_for_guard1)
and exit phis (created by slpeel_update_phi_nodes_for_guard2).
- IS_NEW_LOOP: if IS_NEW_LOOP is true, then LOOP is a newly created copy of
an original loop. i.e., we have:
orig_loop
guard_bb (goto LOOP/new_merge)
new_loop <-- LOOP
new_exit
new_merge
next_bb
If IS_NEW_LOOP is false, then LOOP is an original loop, in which case we
have:
new_loop
guard_bb (goto LOOP/new_merge)
orig_loop <-- LOOP
new_exit
new_merge
next_bb
The SSA names defined in the original loop have a current
reaching definition that that records the corresponding new
ssa-name used in the new duplicated loop copy.
*/
/* Function slpeel_update_phi_nodes_for_guard1
Input:
- GUARD_EDGE, LOOP, IS_NEW_LOOP, NEW_EXIT_BB - as explained above.
- DEFS - a bitmap of ssa names to mark new names for which we recorded
information.
In the context of the overall structure, we have:
loop1_preheader_bb:
guard1 (goto loop1/merge1_bb)
LOOP-> loop1
loop1_exit_bb:
guard2 (goto merge1_bb/merge2_bb)
merge1_bb
loop2
loop2_exit_bb
merge2_bb
next_bb
For each name updated between loop iterations (i.e - for each name that has
an entry (loop-header) phi in LOOP) we create a new phi in:
1. merge1_bb (to account for the edge from guard1)
2. loop1_exit_bb (an exit-phi to keep LOOP in loop-closed form)
*/
static void
slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
bool is_new_loop, basic_block *new_exit_bb,
bitmap *defs)
{
gimple orig_phi, new_phi;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge->dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e->dest;
basic_block orig_bb = loop->header;
edge new_exit_e;
tree current_new_name;
tree name;
gimple_stmt_iterator gsi_orig, gsi_update;
/* Create new bb between loop and new_merge_bb. */
*new_exit_bb = split_edge (single_exit (loop));
new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
for (gsi_orig = gsi_start_phis (orig_bb),
gsi_update = gsi_start_phis (update_bb);
!gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
gsi_next (&gsi_orig), gsi_next (&gsi_update))
{
orig_phi = gsi_stmt (gsi_orig);
update_phi = gsi_stmt (gsi_update);
/* Virtual phi; Mark it for renaming. We actually want to call
mar_sym_for_renaming, but since all ssa renaming datastructures
are going to be freed before we get to call ssa_update, we just
record this name for now in a bitmap, and will mark it for
renaming later. */
name = PHI_RESULT (orig_phi);
if (!is_gimple_reg (SSA_NAME_VAR (name)))
bitmap_set_bit (vect_memsyms_to_rename, DECL_UID (SSA_NAME_VAR (name)));
/** 1. Handle new-merge-point phis **/
/* 1.1. Generate new phi node in NEW_MERGE_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
new_merge_bb);
/* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge
of LOOP. Set the two phi args in NEW_PHI for these edges: */
loop_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, EDGE_SUCC (loop->latch, 0));
guard_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, loop_preheader_edge (loop));
add_phi_arg (new_phi, loop_arg, new_exit_e);
add_phi_arg (new_phi, guard_arg, guard_edge);
/* 1.3. Update phi in successor block. */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi, e) == loop_arg
|| PHI_ARG_DEF_FROM_EDGE (update_phi, e) == guard_arg);
SET_PHI_ARG_DEF (update_phi, e->dest_idx, PHI_RESULT (new_phi));
update_phi2 = new_phi;
/** 2. Handle loop-closed-ssa-form phis **/
if (!is_gimple_reg (PHI_RESULT (orig_phi)))
continue;
/* 2.1. Generate new phi node in NEW_EXIT_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
*new_exit_bb);
/* 2.2. NEW_EXIT_BB has one incoming edge: the exit-edge of the loop. */
add_phi_arg (new_phi, loop_arg, single_exit (loop));
/* 2.3. Update phi in successor of NEW_EXIT_BB: */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, new_exit_e) == loop_arg);
SET_PHI_ARG_DEF (update_phi2, new_exit_e->dest_idx, PHI_RESULT (new_phi));
/* 2.4. Record the newly created name with set_current_def.
We want to find a name such that
name = get_current_def (orig_loop_name)
and to set its current definition as follows:
set_current_def (name, new_phi_name)
If LOOP is a new loop then loop_arg is already the name we're
looking for. If LOOP is the original loop, then loop_arg is
the orig_loop_name and the relevant name is recorded in its
current reaching definition. */
if (is_new_loop)
current_new_name = loop_arg;
else
{
current_new_name = get_current_def (loop_arg);
/* current_def is not available only if the variable does not
change inside the loop, in which case we also don't care
about recording a current_def for it because we won't be
trying to create loop-exit-phis for it. */
if (!current_new_name)
continue;
}
gcc_assert (get_current_def (current_new_name) == NULL_TREE);
set_current_def (current_new_name, PHI_RESULT (new_phi));
bitmap_set_bit (*defs, SSA_NAME_VERSION (current_new_name));
}
}
/* Function slpeel_update_phi_nodes_for_guard2
Input:
- GUARD_EDGE, LOOP, IS_NEW_LOOP, NEW_EXIT_BB - as explained above.
In the context of the overall structure, we have:
loop1_preheader_bb:
guard1 (goto loop1/merge1_bb)
loop1
loop1_exit_bb:
guard2 (goto merge1_bb/merge2_bb)
merge1_bb
LOOP-> loop2
loop2_exit_bb
merge2_bb
next_bb
For each name used out side the loop (i.e - for each name that has an exit
phi in next_bb) we create a new phi in:
1. merge2_bb (to account for the edge from guard_bb)
2. loop2_exit_bb (an exit-phi to keep LOOP in loop-closed form)
3. guard2 bb (an exit phi to keep the preceding loop in loop-closed form),
if needed (if it wasn't handled by slpeel_update_phis_nodes_for_phi1).
*/
static void
slpeel_update_phi_nodes_for_guard2 (edge guard_edge, struct loop *loop,
bool is_new_loop, basic_block *new_exit_bb)
{
gimple orig_phi, new_phi;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge->dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e->dest;
edge new_exit_e;
tree orig_def, orig_def_new_name;
tree new_name, new_name2;
tree arg;
gimple_stmt_iterator gsi;
/* Create new bb between loop and new_merge_bb. */
*new_exit_bb = split_edge (single_exit (loop));
new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
for (gsi = gsi_start_phis (update_bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
update_phi = gsi_stmt (gsi);
orig_phi = update_phi;
orig_def = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
/* This loop-closed-phi actually doesn't represent a use
out of the loop - the phi arg is a constant. */
if (TREE_CODE (orig_def) != SSA_NAME)
continue;
orig_def_new_name = get_current_def (orig_def);
arg = NULL_TREE;
/** 1. Handle new-merge-point phis **/
/* 1.1. Generate new phi node in NEW_MERGE_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
new_merge_bb);
/* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge
of LOOP. Set the two PHI args in NEW_PHI for these edges: */
new_name = orig_def;
new_name2 = NULL_TREE;
if (orig_def_new_name)
{
new_name = orig_def_new_name;
/* Some variables have both loop-entry-phis and loop-exit-phis.
Such variables were given yet newer names by phis placed in
guard_bb by slpeel_update_phi_nodes_for_guard1. I.e:
new_name2 = get_current_def (get_current_def (orig_name)). */
new_name2 = get_current_def (new_name);
}
if (is_new_loop)
{
guard_arg = orig_def;
loop_arg = new_name;
}
else
{
guard_arg = new_name;
loop_arg = orig_def;
}
if (new_name2)
guard_arg = new_name2;
add_phi_arg (new_phi, loop_arg, new_exit_e);
add_phi_arg (new_phi, guard_arg, guard_edge);
/* 1.3. Update phi in successor block. */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi, e) == orig_def);
SET_PHI_ARG_DEF (update_phi, e->dest_idx, PHI_RESULT (new_phi));
update_phi2 = new_phi;
/** 2. Handle loop-closed-ssa-form phis **/
/* 2.1. Generate new phi node in NEW_EXIT_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
*new_exit_bb);
/* 2.2. NEW_EXIT_BB has one incoming edge: the exit-edge of the loop. */
add_phi_arg (new_phi, loop_arg, single_exit (loop));
/* 2.3. Update phi in successor of NEW_EXIT_BB: */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, new_exit_e) == loop_arg);
SET_PHI_ARG_DEF (update_phi2, new_exit_e->dest_idx, PHI_RESULT (new_phi));
/** 3. Handle loop-closed-ssa-form phis for first loop **/
/* 3.1. Find the relevant names that need an exit-phi in
GUARD_BB, i.e. names for which
slpeel_update_phi_nodes_for_guard1 had not already created a
phi node. This is the case for names that are used outside
the loop (and therefore need an exit phi) but are not updated
across loop iterations (and therefore don't have a
loop-header-phi).
slpeel_update_phi_nodes_for_guard1 is responsible for
creating loop-exit phis in GUARD_BB for names that have a
loop-header-phi. When such a phi is created we also record
the new name in its current definition. If this new name
exists, then guard_arg was set to this new name (see 1.2
above). Therefore, if guard_arg is not this new name, this
is an indication that an exit-phi in GUARD_BB was not yet
created, so we take care of it here. */
if (guard_arg == new_name2)
continue;
arg = guard_arg;
/* 3.2. Generate new phi node in GUARD_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
guard_edge->src);
/* 3.3. GUARD_BB has one incoming edge: */
gcc_assert (EDGE_COUNT (guard_edge->src->preds) == 1);
add_phi_arg (new_phi, arg, EDGE_PRED (guard_edge->src, 0));
/* 3.4. Update phi in successor of GUARD_BB: */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, guard_edge)
== guard_arg);
SET_PHI_ARG_DEF (update_phi2, guard_edge->dest_idx, PHI_RESULT (new_phi));
}
}
/* Make the LOOP iterate NITERS times. This is done by adding a new IV
that starts at zero, increases by one and its limit is NITERS.
Assumption: the exit-condition of LOOP is the last stmt in the loop. */
void
slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
{
tree indx_before_incr, indx_after_incr;
gimple cond_stmt;
gimple orig_cond;
edge exit_edge = single_exit (loop);
gimple_stmt_iterator loop_cond_gsi;
gimple_stmt_iterator incr_gsi;
bool insert_after;
tree init = build_int_cst (TREE_TYPE (niters), 0);
tree step = build_int_cst (TREE_TYPE (niters), 1);
LOC loop_loc;
enum tree_code code;
orig_cond = get_loop_exit_condition (loop);
gcc_assert (orig_cond);
loop_cond_gsi = gsi_for_stmt (orig_cond);
standard_iv_increment_position (loop, &incr_gsi, &insert_after);
create_iv (init, step, NULL_TREE, loop,
&incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,
true, NULL_TREE, true,
GSI_SAME_STMT);
niters = force_gimple_operand_gsi (&loop_cond_gsi, niters, true, NULL_TREE,
true, GSI_SAME_STMT);
code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR;
cond_stmt = gimple_build_cond (code, indx_after_incr, niters, NULL_TREE,
NULL_TREE);
gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
/* Remove old loop exit test: */
gsi_remove (&loop_cond_gsi, true);
loop_loc = find_loop_location (loop);
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop_loc != UNKNOWN_LOC)
fprintf (dump_file, "\nloop at %s:%d: ",
LOC_FILE (loop_loc), LOC_LINE (loop_loc));
print_gimple_stmt (dump_file, cond_stmt, 0, TDF_SLIM);
}
loop->nb_iterations = niters;
}
/* Given LOOP this function generates a new copy of it and puts it
on E which is either the entry or exit of LOOP. */
struct loop *
slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, edge e)
{
struct loop *new_loop;
basic_block *new_bbs, *bbs;
bool at_exit;
bool was_imm_dom;
basic_block exit_dest;
gimple phi;
tree phi_arg;
edge exit, new_exit;
gimple_stmt_iterator gsi;
at_exit = (e == single_exit (loop));
if (!at_exit && e != loop_preheader_edge (loop))
return NULL;
bbs = get_loop_body (loop);
/* Check whether duplication is possible. */
if (!can_copy_bbs_p (bbs, loop->num_nodes))
{
free (bbs);
return NULL;
}
/* Generate new loop structure. */
new_loop = duplicate_loop (loop, loop_outer (loop));
if (!new_loop)
{
free (bbs);
return NULL;
}
exit_dest = single_exit (loop)->dest;
was_imm_dom = (get_immediate_dominator (CDI_DOMINATORS,
exit_dest) == loop->header ?
true : false);
new_bbs = XNEWVEC (basic_block, loop->num_nodes);
exit = single_exit (loop);
copy_bbs (bbs, loop->num_nodes, new_bbs,
&exit, 1, &new_exit, NULL,
e->src);
/* Duplicating phi args at exit bbs as coming
also from exit of duplicated loop. */
for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi); gsi_next (&gsi))
{
phi = gsi_stmt (gsi);
phi_arg = PHI_ARG_DEF_FROM_EDGE (phi, single_exit (loop));
if (phi_arg)
{
edge new_loop_exit_edge;
if (EDGE_SUCC (new_loop->header, 0)->dest == new_loop->latch)
new_loop_exit_edge = EDGE_SUCC (new_loop->header, 1);
else
new_loop_exit_edge = EDGE_SUCC (new_loop->header, 0);
add_phi_arg (phi, phi_arg, new_loop_exit_edge);
}
}
if (at_exit) /* Add the loop copy at exit. */
{
redirect_edge_and_branch_force (e, new_loop->header);
PENDING_STMT (e) = NULL;
set_immediate_dominator (CDI_DOMINATORS, new_loop->header, e->src);
if (was_imm_dom)
set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_loop->header);
}
else /* Add the copy at entry. */
{
edge new_exit_e;
edge entry_e = loop_preheader_edge (loop);
basic_block preheader = entry_e->src;
if (!flow_bb_inside_loop_p (new_loop,
EDGE_SUCC (new_loop->header, 0)->dest))
new_exit_e = EDGE_SUCC (new_loop->header, 0);
else
new_exit_e = EDGE_SUCC (new_loop->header, 1);
redirect_edge_and_branch_force (new_exit_e, loop->header);
PENDING_STMT (new_exit_e) = NULL;
set_immediate_dominator (CDI_DOMINATORS, loop->header,
new_exit_e->src);
/* We have to add phi args to the loop->header here as coming
from new_exit_e edge. */
for (gsi = gsi_start_phis (loop->header);
!gsi_end_p (gsi);
gsi_next (&gsi))
{
phi = gsi_stmt (gsi);
phi_arg = PHI_ARG_DEF_FROM_EDGE (phi, entry_e);
if (phi_arg)
add_phi_arg (phi, phi_arg, new_exit_e);
}
redirect_edge_and_branch_force (entry_e, new_loop->header);
PENDING_STMT (entry_e) = NULL;
set_immediate_dominator (CDI_DOMINATORS, new_loop->header, preheader);
}
free (new_bbs);
free (bbs);
return new_loop;
}
/* Given the condition statement COND, put it as the last statement
of GUARD_BB; EXIT_BB is the basic block to skip the loop;
Assumes that this is the single exit of the guarded loop.
Returns the skip edge. */
static edge
slpeel_add_loop_guard (basic_block guard_bb, tree cond, basic_block exit_bb,
basic_block dom_bb)
{
gimple_stmt_iterator gsi;
edge new_e, enter_e;
gimple cond_stmt;
gimple_seq gimplify_stmt_list = NULL;
enter_e = EDGE_SUCC (guard_bb, 0);
enter_e->flags &= ~EDGE_FALLTHRU;
enter_e->flags |= EDGE_FALSE_VALUE;
gsi = gsi_last_bb (guard_bb);
cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
cond_stmt = gimple_build_cond (NE_EXPR,
cond, build_int_cst (TREE_TYPE (cond), 0),
NULL_TREE, NULL_TREE);
if (gimplify_stmt_list)
gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
gsi = gsi_last_bb (guard_bb);
gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
/* Add new edge to connect guard block to the merge/loop-exit block. */
new_e = make_edge (guard_bb, exit_bb, EDGE_TRUE_VALUE);
set_immediate_dominator (CDI_DOMINATORS, exit_bb, dom_bb);
return new_e;
}
/* This function verifies that the following restrictions apply to LOOP:
(1) it is innermost
(2) it consists of exactly 2 basic blocks - header, and an empty latch.
(3) it is single entry, single exit
(4) its exit condition is the last stmt in the header
(5) E is the entry/exit edge of LOOP.
*/
bool
slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
{
edge exit_e = single_exit (loop);
edge entry_e = loop_preheader_edge (loop);
gimple orig_cond = get_loop_exit_condition (loop);
gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
if (need_ssa_update_p ())
return false;
if (loop->inner
/* All loops have an outer scope; the only case loop->outer is NULL is for
the function itself. */
|| !loop_outer (loop)
|| loop->num_nodes != 2
|| !empty_block_p (loop->latch)
|| !single_exit (loop)
/* Verify that new loop exit condition can be trivially modified. */
|| (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
|| (e != exit_e && e != entry_e))
return false;
return true;
}
#ifdef ENABLE_CHECKING
static void
slpeel_verify_cfg_after_peeling (struct loop *first_loop,
struct loop *second_loop)
{
basic_block loop1_exit_bb = single_exit (first_loop)->dest;
basic_block loop2_entry_bb = loop_preheader_edge (second_loop)->src;
basic_block loop1_entry_bb = loop_preheader_edge (first_loop)->src;
/* A guard that controls whether the second_loop is to be executed or skipped
is placed in first_loop->exit. first_loop->exit therefore has two
successors - one is the preheader of second_loop, and the other is a bb
after second_loop.
*/
gcc_assert (EDGE_COUNT (loop1_exit_bb->succs) == 2);
/* 1. Verify that one of the successors of first_loop->exit is the preheader
of second_loop. */
/* The preheader of new_loop is expected to have two predecessors:
first_loop->exit and the block that precedes first_loop. */
gcc_assert (EDGE_COUNT (loop2_entry_bb->preds) == 2
&& ((EDGE_PRED (loop2_entry_bb, 0)->src == loop1_exit_bb
&& EDGE_PRED (loop2_entry_bb, 1)->src == loop1_entry_bb)
|| (EDGE_PRED (loop2_entry_bb, 1)->src == loop1_exit_bb
&& EDGE_PRED (loop2_entry_bb, 0)->src == loop1_entry_bb)));
/* Verify that the other successor of first_loop->exit is after the
second_loop. */
/* TODO */
}
#endif
/* If the run time cost model check determines that vectorization is
not profitable and hence scalar loop should be generated then set
FIRST_NITERS to prologue peeled iterations. This will allow all the
iterations to be executed in the prologue peeled scalar loop. */
static void
set_prologue_iterations (basic_block bb_before_first_loop,
tree first_niters,
struct loop *loop,
unsigned int th)
{
edge e;
basic_block cond_bb, then_bb;
tree var, prologue_after_cost_adjust_name;
gimple_stmt_iterator gsi;
gimple newphi;
edge e_true, e_false, e_fallthru;
gimple cond_stmt;
gimple_seq gimplify_stmt_list = NULL, stmts = NULL;
tree cost_pre_condition = NULL_TREE;
tree scalar_loop_iters =
unshare_expr (LOOP_VINFO_NITERS_UNCHANGED (loop_vec_info_for_loop (loop)));
e = single_pred_edge (bb_before_first_loop);
cond_bb = split_edge(e);
e = single_pred_edge (bb_before_first_loop);
then_bb = split_edge(e);
set_immediate_dominator (CDI_DOMINATORS, then_bb, cond_bb);
e_false = make_single_succ_edge (cond_bb, bb_before_first_loop,
EDGE_FALSE_VALUE);
set_immediate_dominator (CDI_DOMINATORS, bb_before_first_loop, cond_bb);
e_true = EDGE_PRED (then_bb, 0);
e_true->flags &= ~EDGE_FALLTHRU;
e_true->flags |= EDGE_TRUE_VALUE;
e_fallthru = EDGE_SUCC (then_bb, 0);
cost_pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters), th));
cost_pre_condition =
force_gimple_operand (cost_pre_condition, &gimplify_stmt_list,
true, NULL_TREE);
cond_stmt = gimple_build_cond (NE_EXPR, cost_pre_condition,
build_int_cst (TREE_TYPE (cost_pre_condition),
0), NULL_TREE, NULL_TREE);
gsi = gsi_last_bb (cond_bb);
if (gimplify_stmt_list)
gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
gsi = gsi_last_bb (cond_bb);
gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
var = create_tmp_var (TREE_TYPE (scalar_loop_iters),
"prologue_after_cost_adjust");
add_referenced_var (var);
prologue_after_cost_adjust_name =
force_gimple_operand (scalar_loop_iters, &stmts, false, var);
gsi = gsi_last_bb (then_bb);
if (stmts)
gsi_insert_seq_after (&gsi, stmts, GSI_NEW_STMT);
newphi = create_phi_node (var, bb_before_first_loop);
add_phi_arg (newphi, prologue_after_cost_adjust_name, e_fallthru);
add_phi_arg (newphi, first_niters, e_false);
first_niters = PHI_RESULT (newphi);
}
/* Function slpeel_tree_peel_loop_to_edge.
Peel the first (last) iterations of LOOP into a new prolog (epilog) loop
that is placed on the entry (exit) edge E of LOOP. After this transformation
we have two loops one after the other - first-loop iterates FIRST_NITERS
times, and second-loop iterates the remainder NITERS - FIRST_NITERS times.
If the cost model indicates that it is profitable to emit a scalar
loop instead of the vector one, then the prolog (epilog) loop will iterate
for the entire unchanged scalar iterations of the loop.
Input:
- LOOP: the loop to be peeled.
- E: the exit or entry edge of LOOP.
If it is the entry edge, we peel the first iterations of LOOP. In this
case first-loop is LOOP, and second-loop is the newly created loop.
If it is the exit edge, we peel the last iterations of LOOP. In this
case, first-loop is the newly created loop, and second-loop is LOOP.
- NITERS: the number of iterations that LOOP iterates.
- FIRST_NITERS: the number of iterations that the first-loop should iterate.
- UPDATE_FIRST_LOOP_COUNT: specified whether this function is responsible
for updating the loop bound of the first-loop to FIRST_NITERS. If it
is false, the caller of this function may want to take care of this
(this can be useful if we don't want new stmts added to first-loop).
- TH: cost model profitability threshold of iterations for vectorization.
- CHECK_PROFITABILITY: specify whether cost model check has not occurred
during versioning and hence needs to occur during
prologue generation or whether cost model check
has not occurred during prologue generation and hence
needs to occur during epilogue generation.
Output:
The function returns a pointer to the new loop-copy, or NULL if it failed
to perform the transformation.
The function generates two if-then-else guards: one before the first loop,
and the other before the second loop:
The first guard is:
if (FIRST_NITERS == 0) then skip the first loop,
and go directly to the second loop.
The second guard is:
if (FIRST_NITERS == NITERS) then skip the second loop.
FORNOW only simple loops are supported (see slpeel_can_duplicate_loop_p).
FORNOW the resulting code will not be in loop-closed-ssa form.
*/
static struct loop*
slpeel_tree_peel_loop_to_edge (struct loop *loop,
edge e, tree first_niters,
tree niters, bool update_first_loop_count,
unsigned int th, bool check_profitability)
{
struct loop *new_loop = NULL, *first_loop, *second_loop;
edge skip_e;
tree pre_condition = NULL_TREE;
bitmap definitions;
basic_block bb_before_second_loop, bb_after_second_loop;
basic_block bb_before_first_loop;
basic_block bb_between_loops;
basic_block new_exit_bb;
edge exit_e = single_exit (loop);
LOC loop_loc;
tree cost_pre_condition = NULL_TREE;
if (!slpeel_can_duplicate_loop_p (loop, e))
return NULL;
/* We have to initialize cfg_hooks. Then, when calling
cfg_hooks->split_edge, the function tree_split_edge
is actually called and, when calling cfg_hooks->duplicate_block,
the function tree_duplicate_bb is called. */
gimple_register_cfg_hooks ();
/* 1. Generate a copy of LOOP and put it on E (E is the entry/exit of LOOP).
Resulting CFG would be:
first_loop:
do {
} while ...
second_loop:
do {
} while ...
orig_exit_bb:
*/
if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e)))
{
loop_loc = find_loop_location (loop);
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop_loc != UNKNOWN_LOC)
fprintf (dump_file, "\n%s:%d: note: ",
LOC_FILE (loop_loc), LOC_LINE (loop_loc));
fprintf (dump_file, "tree_duplicate_loop_to_edge_cfg failed.\n");
}
return NULL;
}
if (e == exit_e)
{
/* NEW_LOOP was placed after LOOP. */
first_loop = loop;
second_loop = new_loop;
}
else
{
/* NEW_LOOP was placed before LOOP. */
first_loop = new_loop;
second_loop = loop;
}
definitions = ssa_names_to_replace ();
slpeel_update_phis_for_duplicate_loop (loop, new_loop, e == exit_e);
rename_variables_in_loop (new_loop);
/* 2. Add the guard code in one of the following ways:
2.a Add the guard that controls whether the first loop is executed.
This occurs when this function is invoked for prologue or epilogue
generation and when the cost model check can be done at compile time.
Resulting CFG would be:
bb_before_first_loop:
if (FIRST_NITERS == 0) GOTO bb_before_second_loop
GOTO first-loop
first_loop:
do {
} while ...
bb_before_second_loop:
second_loop:
do {
} while ...
orig_exit_bb:
2.b Add the cost model check that allows the prologue
to iterate for the entire unchanged scalar
iterations of the loop in the event that the cost
model indicates that the scalar loop is more
profitable than the vector one. This occurs when
this function is invoked for prologue generation
and the cost model check needs to be done at run
time.
Resulting CFG after prologue peeling would be:
if (scalar_loop_iterations <= th)
FIRST_NITERS = scalar_loop_iterations
bb_before_first_loop:
if (FIRST_NITERS == 0) GOTO bb_before_second_loop
GOTO first-loop
first_loop:
do {
} while ...
bb_before_second_loop:
second_loop:
do {
} while ...
orig_exit_bb:
2.c Add the cost model check that allows the epilogue
to iterate for the entire unchanged scalar
iterations of the loop in the event that the cost
model indicates that the scalar loop is more
profitable than the vector one. This occurs when
this function is invoked for epilogue generation
and the cost model check needs to be done at run
time.
Resulting CFG after prologue peeling would be:
bb_before_first_loop:
if ((scalar_loop_iterations <= th)
||
FIRST_NITERS == 0) GOTO bb_before_second_loop
GOTO first-loop
first_loop:
do {
} while ...
bb_before_second_loop:
second_loop:
do {
} while ...
orig_exit_bb:
*/
bb_before_first_loop = split_edge (loop_preheader_edge (first_loop));
bb_before_second_loop = split_edge (single_exit (first_loop));
/* Epilogue peeling. */
if (!update_first_loop_count)
{
pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, first_niters,
build_int_cst (TREE_TYPE (first_niters), 0));
if (check_profitability)
{
tree scalar_loop_iters
= unshare_expr (LOOP_VINFO_NITERS_UNCHANGED
(loop_vec_info_for_loop (loop)));
cost_pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters), th));
pre_condition = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
cost_pre_condition, pre_condition);
}
}
/* Prologue peeling. */
else
{
if (check_profitability)
set_prologue_iterations (bb_before_first_loop, first_niters,
loop, th);
pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, first_niters,
build_int_cst (TREE_TYPE (first_niters), 0));
}
skip_e = slpeel_add_loop_guard (bb_before_first_loop, pre_condition,
bb_before_second_loop, bb_before_first_loop);
slpeel_update_phi_nodes_for_guard1 (skip_e, first_loop,
first_loop == new_loop,
&new_exit_bb, &definitions);
/* 3. Add the guard that controls whether the second loop is executed.
Resulting CFG would be:
bb_before_first_loop:
if (FIRST_NITERS == 0) GOTO bb_before_second_loop (skip first loop)
GOTO first-loop
first_loop:
do {
} while ...
bb_between_loops:
if (FIRST_NITERS == NITERS) GOTO bb_after_second_loop (skip second loop)
GOTO bb_before_second_loop
bb_before_second_loop:
second_loop:
do {
} while ...
bb_after_second_loop:
orig_exit_bb:
*/
bb_between_loops = new_exit_bb;
bb_after_second_loop = split_edge (single_exit (second_loop));
pre_condition =
fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition,
bb_after_second_loop, bb_before_first_loop);
slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
second_loop == new_loop, &new_exit_bb);
/* 4. Make first-loop iterate FIRST_NITERS times, if requested.
*/
if (update_first_loop_count)
slpeel_make_loop_iterate_ntimes (first_loop, first_niters);
BITMAP_FREE (definitions);
delete_update_ssa ();
return new_loop;
}
/* Function vect_get_loop_location.
Extract the location of the loop in the source code.
If the loop is not well formed for vectorization, an estimated
location is calculated.
Return the loop location if succeed and NULL if not. */
LOC
find_loop_location (struct loop *loop)
{
gimple stmt = NULL;
basic_block bb;
gimple_stmt_iterator si;
if (!loop)
return UNKNOWN_LOC;
stmt = get_loop_exit_condition (loop);
if (stmt && gimple_location (stmt) != UNKNOWN_LOC)
return gimple_location (stmt);
/* If we got here the loop is probably not "well formed",
try to estimate the loop location */
if (!loop->header)
return UNKNOWN_LOC;
bb = loop->header;
for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
{
stmt = gsi_stmt (si);
if (gimple_location (stmt) != UNKNOWN_LOC)
return gimple_location (stmt);
}
return UNKNOWN_LOC;
}
/* This function builds ni_name = number of iterations loop executes
on the loop preheader. */
static tree
vect_build_loop_niters (loop_vec_info loop_vinfo)
{
tree ni_name, var;
gimple_seq stmts = NULL;
edge pe;
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree ni = unshare_expr (LOOP_VINFO_NITERS (loop_vinfo));
var = create_tmp_var (TREE_TYPE (ni), "niters");
add_referenced_var (var);
ni_name = force_gimple_operand (ni, &stmts, false, var);
pe = loop_preheader_edge (loop);
if (stmts)
{
basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
gcc_assert (!new_bb);
}
return ni_name;
}
/* This function generates the following statements:
ni_name = number of iterations loop executes
ratio = ni_name / vf
ratio_mult_vf_name = ratio * vf
and places them at the loop preheader edge. */
static void
vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
tree *ni_name_ptr,
tree *ratio_mult_vf_name_ptr,
tree *ratio_name_ptr)
{
edge pe;
basic_block new_bb;
gimple_seq stmts;
tree ni_name;
tree var;
tree ratio_name;
tree ratio_mult_vf_name;
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree ni = LOOP_VINFO_NITERS (loop_vinfo);
int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
tree log_vf;
pe = loop_preheader_edge (loop);
/* Generate temporary variable that contains
number of iterations loop executes. */
ni_name = vect_build_loop_niters (loop_vinfo);
log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf));
/* Create: ratio = ni >> log2(vf) */
ratio_name = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name), ni_name, log_vf);
if (!is_gimple_val (ratio_name))
{
var = create_tmp_var (TREE_TYPE (ni), "bnd");
add_referenced_var (var);
stmts = NULL;
ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
pe = loop_preheader_edge (loop);
new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
gcc_assert (!new_bb);
}
/* Create: ratio_mult_vf = ratio << log2 (vf). */
ratio_mult_vf_name = fold_build2 (LSHIFT_EXPR, TREE_TYPE (ratio_name),
ratio_name, log_vf);
if (!is_gimple_val (ratio_mult_vf_name))
{
var = create_tmp_var (TREE_TYPE (ni), "ratio_mult_vf");
add_referenced_var (var);
stmts = NULL;
ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
true, var);
pe = loop_preheader_edge (loop);
new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
gcc_assert (!new_bb);
}
*ni_name_ptr = ni_name;
*ratio_mult_vf_name_ptr = ratio_mult_vf_name;
*ratio_name_ptr = ratio_name;
return;
}
/* Function vect_can_advance_ivs_p
In case the number of iterations that LOOP iterates is unknown at compile
time, an epilog loop will be generated, and the loop induction variables
(IVs) will be "advanced" to the value they are supposed to take just before
the epilog loop. Here we check that the access function of the loop IVs
and the expression that represents the loop bound are simple enough.
These restrictions will be relaxed in the future. */
bool
vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block bb = loop->header;
gimple phi;
gimple_stmt_iterator gsi;
/* Analyze phi functions of the loop header. */
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "vect_can_advance_ivs_p:");
for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
tree access_fn = NULL;
tree evolution_part;
phi = gsi_stmt (gsi);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "Analyze phi: ");
print_gimple_stmt (vect_dump, phi, 0, TDF_SLIM);
}
/* Skip virtual phi's. The data dependences that are associated with
virtual defs/uses (i.e., memory accesses) are analyzed elsewhere. */
if (!is_gimple_reg (SSA_NAME_VAR (PHI_RESULT (phi))))
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "virtual phi. skip.");
continue;
}
/* Skip reduction phis. */
if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (phi)) == vect_reduction_def)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "reduc phi. skip.");
continue;
}
/* Analyze the evolution function. */
access_fn = instantiate_parameters
(loop, analyze_scalar_evolution (loop, PHI_RESULT (phi)));
if (!access_fn)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "No Access function.");
return false;
}
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "Access function of PHI: ");
print_generic_expr (vect_dump, access_fn, TDF_SLIM);
}
evolution_part = evolution_part_in_loop_num (access_fn, loop->num);
if (evolution_part == NULL_TREE)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "No evolution.");
return false;
}
/* FORNOW: We do not transform initial conditions of IVs
which evolution functions are a polynomial of degree >= 2. */
if (tree_is_chrec (evolution_part))
return false;
}
return true;
}
/* Function vect_update_ivs_after_vectorizer.
"Advance" the induction variables of LOOP to the value they should take
after the execution of LOOP. This is currently necessary because the
vectorizer does not handle induction variables that are used after the
loop. Such a situation occurs when the last iterations of LOOP are
peeled, because:
1. We introduced new uses after LOOP for IVs that were not originally used
after LOOP: the IVs of LOOP are now used by an epilog loop.
2. LOOP is going to be vectorized; this means that it will iterate N/VF
times, whereas the loop IVs should be bumped N times.
Input:
- LOOP - a loop that is going to be vectorized. The last few iterations
of LOOP were peeled.
- NITERS - the number of iterations that LOOP executes (before it is
vectorized). i.e, the number of times the ivs should be bumped.
- UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
coming out from LOOP on which there are uses of the LOOP ivs
(this is the path from LOOP->exit to epilog_loop->preheader).
The new definitions of the ivs are placed in LOOP->exit.
The phi args associated with the edge UPDATE_E in the bb
UPDATE_E->dest are updated accordingly.
Assumption 1: Like the rest of the vectorizer, this function assumes
a single loop exit that has a single predecessor.
Assumption 2: The phi nodes in the LOOP header and in update_bb are
organized in the same order.
Assumption 3: The access function of the ivs is simple enough (see
vect_can_advance_ivs_p). This assumption will be relaxed in the future.
Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path
coming out of LOOP on which the ivs of LOOP are used (this is the path
that leads to the epilog loop; other paths skip the epilog loop). This
path starts with the edge UPDATE_E, and its destination (denoted update_bb)
needs to have its phis updated.
*/
static void
vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, tree niters,
edge update_e)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block exit_bb = single_exit (loop)->dest;
gimple phi, phi1;
gimple_stmt_iterator gsi, gsi1;
basic_block update_bb = update_e->dest;
/* gcc_assert (vect_can_advance_ivs_p (loop_vinfo)); */
/* Make sure there exists a single-predecessor exit bb: */
gcc_assert (single_pred_p (exit_bb));
for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
!gsi_end_p (gsi) && !gsi_end_p (gsi1);
gsi_next (&gsi), gsi_next (&gsi1))
{
tree access_fn = NULL;
tree evolution_part;
tree init_expr;
tree step_expr;
tree var, ni, ni_name;
gimple_stmt_iterator last_gsi;
phi = gsi_stmt (gsi);
phi1 = gsi_stmt (gsi1);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "vect_update_ivs_after_vectorizer: phi: ");
print_gimple_stmt (vect_dump, phi, 0, TDF_SLIM);
}
/* Skip virtual phi's. */
if (!is_gimple_reg (SSA_NAME_VAR (PHI_RESULT (phi))))
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "virtual phi. skip.");
continue;
}
/* Skip reduction phis. */
if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (phi)) == vect_reduction_def)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "reduc phi. skip.");
continue;
}
access_fn = analyze_scalar_evolution (loop, PHI_RESULT (phi));
gcc_assert (access_fn);
STRIP_NOPS (access_fn);
evolution_part =
unshare_expr (evolution_part_in_loop_num (access_fn, loop->num));
gcc_assert (evolution_part != NULL_TREE);
/* FORNOW: We do not support IVs whose evolution function is a polynomial
of degree >= 2 or exponential. */
gcc_assert (!tree_is_chrec (evolution_part));
step_expr = evolution_part;
init_expr = unshare_expr (initial_condition_in_loop_num (access_fn,
loop->num));
if (POINTER_TYPE_P (TREE_TYPE (init_expr)))
ni = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (init_expr),
init_expr,
fold_convert (sizetype,
fold_build2 (MULT_EXPR, TREE_TYPE (niters),
niters, step_expr)));
else
ni = fold_build2 (PLUS_EXPR, TREE_TYPE (init_expr),
fold_build2 (MULT_EXPR, TREE_TYPE (init_expr),
fold_convert (TREE_TYPE (init_expr),
niters),
step_expr),
init_expr);
var = create_tmp_var (TREE_TYPE (init_expr), "tmp");
add_referenced_var (var);
last_gsi = gsi_last_bb (exit_bb);
ni_name = force_gimple_operand_gsi (&last_gsi, ni, false, var,
true, GSI_SAME_STMT);
/* Fix phi expressions in the successor bb. */
SET_PHI_ARG_DEF (phi1, update_e->dest_idx, ni_name);
}
}
/* Return the more conservative threshold between the
min_profitable_iters returned by the cost model and the user
specified threshold, if provided. */
static unsigned int
conservative_cost_threshold (loop_vec_info loop_vinfo,
int min_profitable_iters)
{
unsigned int th;
int min_scalar_loop_bound;
min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
* LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 1);
/* Use the cost model only if it is more conservative than user specified
threshold. */
th = (unsigned) min_scalar_loop_bound;
if (min_profitable_iters
&& (!min_scalar_loop_bound
|| min_profitable_iters > min_scalar_loop_bound))
th = (unsigned) min_profitable_iters;
if (th && vect_print_dump_info (REPORT_COST))
fprintf (vect_dump, "Vectorization may not be profitable.");
return th;
}
/* Function vect_do_peeling_for_loop_bound
Peel the last iterations of the loop represented by LOOP_VINFO.
The peeled iterations form a new epilog loop. Given that the loop now
iterates NITERS times, the new epilog loop iterates
NITERS % VECTORIZATION_FACTOR times.
The original loop will later be made to iterate
NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO). */
void
vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio)
{
tree ni_name, ratio_mult_vf_name;
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
struct loop *new_loop;
edge update_e;
basic_block preheader;
int loop_num;
bool check_profitability = false;
unsigned int th = 0;
int min_profitable_iters;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_do_peeling_for_loop_bound ===");
initialize_original_copy_tables ();
/* Generate the following variables on the preheader of original loop:
ni_name = number of iteration the original loop executes
ratio = ni_name / vf
ratio_mult_vf_name = ratio * vf */
vect_generate_tmps_on_preheader (loop_vinfo, &ni_name,
&ratio_mult_vf_name, ratio);
loop_num = loop->num;
/* If cost model check not done during versioning and
peeling for alignment. */
if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
&& !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
&& !LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
{
check_profitability = true;
/* Get profitability threshold for vectorized loop. */
min_profitable_iters = LOOP_VINFO_COST_MODEL_MIN_ITERS (loop_vinfo);
th = conservative_cost_threshold (loop_vinfo,
min_profitable_iters);
}
new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
ratio_mult_vf_name, ni_name, false,
th, check_profitability);
gcc_assert (new_loop);
gcc_assert (loop_num == loop->num);
#ifdef ENABLE_CHECKING
slpeel_verify_cfg_after_peeling (loop, new_loop);
#endif
/* A guard that controls whether the new_loop is to be executed or skipped
is placed in LOOP->exit. LOOP->exit therefore has two successors - one
is the preheader of NEW_LOOP, where the IVs from LOOP are used. The other
is a bb after NEW_LOOP, where these IVs are not used. Find the edge that
is on the path where the LOOP IVs are used and need to be updated. */
preheader = loop_preheader_edge (new_loop)->src;
if (EDGE_PRED (preheader, 0)->src == single_exit (loop)->dest)
update_e = EDGE_PRED (preheader, 0);
else
update_e = EDGE_PRED (preheader, 1);
/* Update IVs of original loop as if they were advanced
by ratio_mult_vf_name steps. */
vect_update_ivs_after_vectorizer (loop_vinfo, ratio_mult_vf_name, update_e);
/* After peeling we have to reset scalar evolution analyzer. */
scev_reset ();
free_original_copy_tables ();
}
/* Function vect_gen_niters_for_prolog_loop
Set the number of iterations for the loop represented by LOOP_VINFO
to the minimum between LOOP_NITERS (the original iteration count of the loop)
and the misalignment of DR - the data reference recorded in
LOOP_VINFO_UNALIGNED_DR (LOOP_VINFO). As a result, after the execution of
this loop, the data reference DR will refer to an aligned location.
The following computation is generated:
If the misalignment of DR is known at compile time:
addr_mis = int mis = DR_MISALIGNMENT (dr);
Else, compute address misalignment in bytes:
addr_mis = addr & (vectype_size - 1)
prolog_niters = min (LOOP_NITERS, ((VF - addr_mis/elem_size)&(VF-1))/step)
(elem_size = element type size; an element is the scalar element whose type
is the inner type of the vectype)
When the step of the data-ref in the loop is not 1 (as in interleaved data
and SLP), the number of iterations of the prolog must be divided by the step
(which is equal to the size of interleaved group).
The above formulas assume that VF == number of elements in the vector. This
may not hold when there are multiple-types in the loop.
In this case, for some data-references in the loop the VF does not represent
the number of elements that fit in the vector. Therefore, instead of VF we
use TYPE_VECTOR_SUBPARTS. */
static tree
vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters)
{
struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree var;
gimple_seq stmts;
tree iters, iters_name;
edge pe;
basic_block new_bb;
gimple dr_stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (dr_stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
int vectype_align = TYPE_ALIGN (vectype) / BITS_PER_UNIT;
tree niters_type = TREE_TYPE (loop_niters);
int step = 1;
int element_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr))));
int nelements = TYPE_VECTOR_SUBPARTS (vectype);
if (STMT_VINFO_STRIDED_ACCESS (stmt_info))
step = DR_GROUP_SIZE (vinfo_for_stmt (DR_GROUP_FIRST_DR (stmt_info)));
pe = loop_preheader_edge (loop);
if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
{
int byte_misalign = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
int elem_misalign = byte_misalign / element_size;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "known alignment = %d.", byte_misalign);
iters = build_int_cst (niters_type,
(((nelements - elem_misalign) & (nelements - 1)) / step));
}
else
{
gimple_seq new_stmts = NULL;
tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt,
&new_stmts, NULL_TREE, loop);
tree ptr_type = TREE_TYPE (start_addr);
tree size = TYPE_SIZE (ptr_type);
tree type = lang_hooks.types.type_for_size (tree_low_cst (size, 1), 1);
tree vectype_size_minus_1 = build_int_cst (type, vectype_align - 1);
tree elem_size_log =
build_int_cst (type, exact_log2 (vectype_align/nelements));
tree nelements_minus_1 = build_int_cst (type, nelements - 1);
tree nelements_tree = build_int_cst (type, nelements);
tree byte_misalign;
tree elem_misalign;
new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmts);
gcc_assert (!new_bb);
/* Create: byte_misalign = addr & (vectype_size - 1) */
byte_misalign =
fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr), vectype_size_minus_1);
/* Create: elem_misalign = byte_misalign / element_size */
elem_misalign =
fold_build2 (RSHIFT_EXPR, type, byte_misalign, elem_size_log);
/* Create: (niters_type) (nelements - elem_misalign)&(nelements - 1) */
iters = fold_build2 (MINUS_EXPR, type, nelements_tree, elem_misalign);
iters = fold_build2 (BIT_AND_EXPR, type, iters, nelements_minus_1);
iters = fold_convert (niters_type, iters);
}
/* Create: prolog_loop_niters = min (iters, loop_niters) */
/* If the loop bound is known at compile time we already verified that it is
greater than vf; since the misalignment ('iters') is at most vf, there's
no need to generate the MIN_EXPR in this case. */
if (TREE_CODE (loop_niters) != INTEGER_CST)
iters = fold_build2 (MIN_EXPR, niters_type, iters, loop_niters);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "niters for prolog loop: ");
print_generic_expr (vect_dump, iters, TDF_SLIM);
}
var = create_tmp_var (niters_type, "prolog_loop_niters");
add_referenced_var (var);
stmts = NULL;
iters_name = force_gimple_operand (iters, &stmts, false, var);
/* Insert stmt on loop preheader edge. */
if (stmts)
{
basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
gcc_assert (!new_bb);
}
return iters_name;
}
/* Function vect_update_init_of_dr
NITERS iterations were peeled from LOOP. DR represents a data reference
in LOOP. This function updates the information recorded in DR to
account for the fact that the first NITERS iterations had already been
executed. Specifically, it updates the OFFSET field of DR. */
static void
vect_update_init_of_dr (struct data_reference *dr, tree niters)
{
tree offset = DR_OFFSET (dr);
niters = fold_build2 (MULT_EXPR, sizetype,
fold_convert (sizetype, niters),
fold_convert (sizetype, DR_STEP (dr)));
offset = fold_build2 (PLUS_EXPR, sizetype, offset, niters);
DR_OFFSET (dr) = offset;
}
/* Function vect_update_inits_of_drs
NITERS iterations were peeled from the loop represented by LOOP_VINFO.
This function updates the information recorded for the data references in
the loop to account for the fact that the first NITERS iterations had
already been executed. Specifically, it updates the initial_condition of
the access_function of all the data_references in the loop. */
static void
vect_update_inits_of_drs (loop_vec_info loop_vinfo, tree niters)
{
unsigned int i;
VEC (data_reference_p, heap) *datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
struct data_reference *dr;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_update_inits_of_dr ===");
for (i = 0; VEC_iterate (data_reference_p, datarefs, i, dr); i++)
vect_update_init_of_dr (dr, niters);
}
/* Function vect_do_peeling_for_alignment
Peel the first 'niters' iterations of the loop represented by LOOP_VINFO.
'niters' is set to the misalignment of one of the data references in the
loop, thereby forcing it to refer to an aligned location at the beginning
of the execution of this loop. The data reference for which we are
peeling is recorded in LOOP_VINFO_UNALIGNED_DR. */
void
vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree niters_of_prolog_loop, ni_name;
tree n_iters;
struct loop *new_loop;
bool check_profitability = false;
unsigned int th = 0;
int min_profitable_iters;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_do_peeling_for_alignment ===");
initialize_original_copy_tables ();
ni_name = vect_build_loop_niters (loop_vinfo);
niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name);
/* If cost model check not done during versioning. */
if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
&& !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
{
check_profitability = true;
/* Get profitability threshold for vectorized loop. */
min_profitable_iters = LOOP_VINFO_COST_MODEL_MIN_ITERS (loop_vinfo);
th = conservative_cost_threshold (loop_vinfo,
min_profitable_iters);
}
/* Peel the prolog loop and iterate it niters_of_prolog_loop. */
new_loop =
slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
niters_of_prolog_loop, ni_name, true,
th, check_profitability);
gcc_assert (new_loop);
#ifdef ENABLE_CHECKING
slpeel_verify_cfg_after_peeling (new_loop, loop);
#endif
/* Update number of times loop executes. */
n_iters = LOOP_VINFO_NITERS (loop_vinfo);
LOOP_VINFO_NITERS (loop_vinfo) = fold_build2 (MINUS_EXPR,
TREE_TYPE (n_iters), n_iters, niters_of_prolog_loop);
/* Update the init conditions of the access functions of all data refs. */
vect_update_inits_of_drs (loop_vinfo, niters_of_prolog_loop);
/* After peeling we have to reset scalar evolution analyzer. */
scev_reset ();
free_original_copy_tables ();
}
/* Function vect_create_cond_for_align_checks.
Create a conditional expression that represents the alignment checks for
all of data references (array element references) whose alignment must be
checked at runtime.
Input:
COND_EXPR - input conditional expression. New conditions will be chained
with logical AND operation.
LOOP_VINFO - two fields of the loop information are used.
LOOP_VINFO_PTR_MASK is the mask used to check the alignment.
LOOP_VINFO_MAY_MISALIGN_STMTS contains the refs to be checked.
Output:
COND_EXPR_STMT_LIST - statements needed to construct the conditional
expression.
The returned value is the conditional expression to be used in the if
statement that controls which version of the loop gets executed at runtime.
The algorithm makes two assumptions:
1) The number of bytes "n" in a vector is a power of 2.
2) An address "a" is aligned if a%n is zero and that this
test can be done as a&(n-1) == 0. For example, for 16
byte vectors the test is a&0xf == 0. */
static void
vect_create_cond_for_align_checks (loop_vec_info loop_vinfo,
tree *cond_expr,
gimple_seq *cond_expr_stmt_list)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
VEC(gimple,heap) *may_misalign_stmts
= LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo);
gimple ref_stmt;
int mask = LOOP_VINFO_PTR_MASK (loop_vinfo);
tree mask_cst;
unsigned int i;
tree psize;
tree int_ptrsize_type;
char tmp_name[20];
tree or_tmp_name = NULL_TREE;
tree and_tmp, and_tmp_name;
gimple and_stmt;
tree ptrsize_zero;
tree part_cond_expr;
/* Check that mask is one less than a power of 2, i.e., mask is
all zeros followed by all ones. */
gcc_assert ((mask != 0) && ((mask & (mask+1)) == 0));
/* CHECKME: what is the best integer or unsigned type to use to hold a
cast from a pointer value? */
psize = TYPE_SIZE (ptr_type_node);
int_ptrsize_type
= lang_hooks.types.type_for_size (tree_low_cst (psize, 1), 0);
/* Create expression (mask & (dr_1 || ... || dr_n)) where dr_i is the address
of the first vector of the i'th data reference. */
for (i = 0; VEC_iterate (gimple, may_misalign_stmts, i, ref_stmt); i++)
{
gimple_seq new_stmt_list = NULL;
tree addr_base;
tree addr_tmp, addr_tmp_name;
tree or_tmp, new_or_tmp_name;
gimple addr_stmt, or_stmt;
/* create: addr_tmp = (int)(address_of_first_vector) */
addr_base =
vect_create_addr_base_for_vector_ref (ref_stmt, &new_stmt_list,
NULL_TREE, loop);
if (new_stmt_list != NULL)
gimple_seq_add_seq (cond_expr_stmt_list, new_stmt_list);
sprintf (tmp_name, "%s%d", "addr2int", i);
addr_tmp = create_tmp_var (int_ptrsize_type, tmp_name);
add_referenced_var (addr_tmp);
addr_tmp_name = make_ssa_name (addr_tmp, NULL);
addr_stmt = gimple_build_assign_with_ops (NOP_EXPR, addr_tmp_name,
addr_base, NULL_TREE);
SSA_NAME_DEF_STMT (addr_tmp_name) = addr_stmt;
gimple_seq_add_stmt (cond_expr_stmt_list, addr_stmt);
/* The addresses are OR together. */
if (or_tmp_name != NULL_TREE)
{
/* create: or_tmp = or_tmp | addr_tmp */
sprintf (tmp_name, "%s%d", "orptrs", i);
or_tmp = create_tmp_var (int_ptrsize_type, tmp_name);
add_referenced_var (or_tmp);
new_or_tmp_name = make_ssa_name (or_tmp, NULL);
or_stmt = gimple_build_assign_with_ops (BIT_IOR_EXPR,
new_or_tmp_name,
or_tmp_name, addr_tmp_name);
SSA_NAME_DEF_STMT (new_or_tmp_name) = or_stmt;
gimple_seq_add_stmt (cond_expr_stmt_list, or_stmt);
or_tmp_name = new_or_tmp_name;
}
else
or_tmp_name = addr_tmp_name;
} /* end for i */
mask_cst = build_int_cst (int_ptrsize_type, mask);
/* create: and_tmp = or_tmp & mask */
and_tmp = create_tmp_var (int_ptrsize_type, "andmask" );
add_referenced_var (and_tmp);
and_tmp_name = make_ssa_name (and_tmp, NULL);
and_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, and_tmp_name,
or_tmp_name, mask_cst);
SSA_NAME_DEF_STMT (and_tmp_name) = and_stmt;
gimple_seq_add_stmt (cond_expr_stmt_list, and_stmt);
/* Make and_tmp the left operand of the conditional test against zero.
if and_tmp has a nonzero bit then some address is unaligned. */
ptrsize_zero = build_int_cst (int_ptrsize_type, 0);
part_cond_expr = fold_build2 (EQ_EXPR, boolean_type_node,
and_tmp_name, ptrsize_zero);
if (*cond_expr)
*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
*cond_expr, part_cond_expr);
else
*cond_expr = part_cond_expr;
}
/* Function vect_vfa_segment_size.
Create an expression that computes the size of segment
that will be accessed for a data reference. The functions takes into
account that realignment loads may access one more vector.
Input:
DR: The data reference.
VECT_FACTOR: vectorization factor.
Return an expression whose value is the size of segment which will be
accessed by DR. */
static tree
vect_vfa_segment_size (struct data_reference *dr, tree vect_factor)
{
tree segment_length = fold_build2 (MULT_EXPR, integer_type_node,
DR_STEP (dr), vect_factor);
if (vect_supportable_dr_alignment (dr) == dr_explicit_realign_optimized)
{
tree vector_size = TYPE_SIZE_UNIT
(STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
segment_length = fold_build2 (PLUS_EXPR, integer_type_node,
segment_length, vector_size);
}
return fold_convert (sizetype, segment_length);
}
/* Function vect_create_cond_for_alias_checks.
Create a conditional expression that represents the run-time checks for
overlapping of address ranges represented by a list of data references
relations passed as input.
Input:
COND_EXPR - input conditional expression. New conditions will be chained
with logical AND operation.
LOOP_VINFO - field LOOP_VINFO_MAY_ALIAS_STMTS contains the list of ddrs
to be checked.
Output:
COND_EXPR - conditional expression.
COND_EXPR_STMT_LIST - statements needed to construct the conditional
expression.
The returned value is the conditional expression to be used in the if
statement that controls which version of the loop gets executed at runtime.
*/
static void
vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo,
tree * cond_expr,
gimple_seq * cond_expr_stmt_list)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
VEC (ddr_p, heap) * may_alias_ddrs =
LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
tree vect_factor =
build_int_cst (integer_type_node, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
ddr_p ddr;
unsigned int i;
tree part_cond_expr;
/* Create expression
((store_ptr_0 + store_segment_length_0) < load_ptr_0)
|| (load_ptr_0 + load_segment_length_0) < store_ptr_0))
&&
...
&&
((store_ptr_n + store_segment_length_n) < load_ptr_n)
|| (load_ptr_n + load_segment_length_n) < store_ptr_n)) */
if (VEC_empty (ddr_p, may_alias_ddrs))
return;
for (i = 0; VEC_iterate (ddr_p, may_alias_ddrs, i, ddr); i++)
{
struct data_reference *dr_a, *dr_b;
gimple dr_group_first_a, dr_group_first_b;
tree addr_base_a, addr_base_b;
tree segment_length_a, segment_length_b;
gimple stmt_a, stmt_b;
dr_a = DDR_A (ddr);
stmt_a = DR_STMT (DDR_A (ddr));
dr_group_first_a = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_a));
if (dr_group_first_a)
{
stmt_a = dr_group_first_a;
dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
}
dr_b = DDR_B (ddr);
stmt_b = DR_STMT (DDR_B (ddr));
dr_group_first_b = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_b));
if (dr_group_first_b)
{
stmt_b = dr_group_first_b;
dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
}
addr_base_a =
vect_create_addr_base_for_vector_ref (stmt_a, cond_expr_stmt_list,
NULL_TREE, loop);
addr_base_b =
vect_create_addr_base_for_vector_ref (stmt_b, cond_expr_stmt_list,
NULL_TREE, loop);
segment_length_a = vect_vfa_segment_size (dr_a, vect_factor);
segment_length_b = vect_vfa_segment_size (dr_b, vect_factor);
if (vect_print_dump_info (REPORT_DR_DETAILS))
{
fprintf (vect_dump,
"create runtime check for data references ");
print_generic_expr (vect_dump, DR_REF (dr_a), TDF_SLIM);
fprintf (vect_dump, " and ");
print_generic_expr (vect_dump, DR_REF (dr_b), TDF_SLIM);
}
part_cond_expr =
fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
fold_build2 (LT_EXPR, boolean_type_node,
fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (addr_base_a),
addr_base_a,
segment_length_a),
addr_base_b),
fold_build2 (LT_EXPR, boolean_type_node,
fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (addr_base_b),
addr_base_b,
segment_length_b),
addr_base_a));
if (*cond_expr)
*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
*cond_expr, part_cond_expr);
else
*cond_expr = part_cond_expr;
}
if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
fprintf (vect_dump, "created %u versioning for alias checks.\n",
VEC_length (ddr_p, may_alias_ddrs));
}
/* Function vect_loop_versioning.
If the loop has data references that may or may not be aligned or/and
has data reference relations whose independence was not proven then
two versions of the loop need to be generated, one which is vectorized
and one which isn't. A test is then generated to control which of the
loops is executed. The test checks for the alignment of all of the
data references that may or may not be aligned. An additional
sequence of runtime tests is generated for each pairs of DDRs whose
independence was not proven. The vectorized version of loop is
executed only if both alias and alignment tests are passed.
The test generated to check which version of loop is executed
is modified to also check for profitability as indicated by the
cost model initially. */
void
vect_loop_versioning (loop_vec_info loop_vinfo)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
struct loop *nloop;
tree cond_expr = NULL_TREE;
gimple_seq cond_expr_stmt_list = NULL;
basic_block condition_bb;
gimple_stmt_iterator gsi, cond_exp_gsi;
basic_block merge_bb;
basic_block new_exit_bb;
edge new_exit_e, e;
gimple orig_phi, new_phi;
tree arg;
unsigned prob = 4 * REG_BR_PROB_BASE / 5;
gimple_seq gimplify_stmt_list = NULL;
tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
int min_profitable_iters = 0;
unsigned int th;
/* Get profitability threshold for vectorized loop. */
min_profitable_iters = LOOP_VINFO_COST_MODEL_MIN_ITERS (loop_vinfo);
th = conservative_cost_threshold (loop_vinfo,
min_profitable_iters);
cond_expr =
fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters), th));
cond_expr = force_gimple_operand (cond_expr, &cond_expr_stmt_list,
false, NULL_TREE);
if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo)))
vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
&cond_expr_stmt_list);
if (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr,
&cond_expr_stmt_list);
cond_expr =
fold_build2 (NE_EXPR, boolean_type_node, cond_expr, integer_zero_node);
cond_expr =
force_gimple_operand (cond_expr, &gimplify_stmt_list, true, NULL_TREE);
gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
initialize_original_copy_tables ();
nloop = loop_version (loop, cond_expr, &condition_bb,
prob, prob, REG_BR_PROB_BASE - prob, true);
free_original_copy_tables();
/* Loop versioning violates an assumption we try to maintain during
vectorization - that the loop exit block has a single predecessor.
After versioning, the exit block of both loop versions is the same
basic block (i.e. it has two predecessors). Just in order to simplify
following transformations in the vectorizer, we fix this situation
here by adding a new (empty) block on the exit-edge of the loop,
with the proper loop-exit phis to maintain loop-closed-form. */
merge_bb = single_exit (loop)->dest;
gcc_assert (EDGE_COUNT (merge_bb->preds) == 2);
new_exit_bb = split_edge (single_exit (loop));
new_exit_e = single_exit (loop);
e = EDGE_SUCC (new_exit_bb, 0);
for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
orig_phi = gsi_stmt (gsi);
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
new_exit_bb);
arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
add_phi_arg (new_phi, arg, new_exit_e);
SET_PHI_ARG_DEF (orig_phi, e->dest_idx, PHI_RESULT (new_phi));
}
/* End loop-exit-fixes after versioning. */
update_ssa (TODO_update_ssa);
if (cond_expr_stmt_list)
{
cond_exp_gsi = gsi_last_bb (condition_bb);
gsi_insert_seq_before (&cond_exp_gsi, cond_expr_stmt_list, GSI_SAME_STMT);
}
}
This source diff could not be displayed because it is too large. You can view the blob instead.
/* Analysis Utilities for Loop Vectorization. /* Analysis Utilities for Loop Vectorization.
Copyright (C) 2006, 2007, 2008 Free Software Foundation, Inc. Copyright (C) 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
Contributed by Dorit Nuzman <dorit@il.ibm.com> Contributed by Dorit Nuzman <dorit@il.ibm.com>
This file is part of GCC. This file is part of GCC.
...@@ -24,13 +24,11 @@ along with GCC; see the file COPYING3. If not see ...@@ -24,13 +24,11 @@ along with GCC; see the file COPYING3. If not see
#include "tm.h" #include "tm.h"
#include "ggc.h" #include "ggc.h"
#include "tree.h" #include "tree.h"
#include "target.h" #include "target.h"
#include "basic-block.h" #include "basic-block.h"
#include "diagnostic.h" #include "diagnostic.h"
#include "tree-flow.h" #include "tree-flow.h"
#include "tree-dump.h" #include "tree-dump.h"
#include "timevar.h"
#include "cfgloop.h" #include "cfgloop.h"
#include "expr.h" #include "expr.h"
#include "optabs.h" #include "optabs.h"
......
/* SLP - Basic Block Vectorization
Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc.
Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com>
and Ira Rosen <irar@il.ibm.com>
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "ggc.h"
#include "tree.h"
#include "target.h"
#include "basic-block.h"
#include "diagnostic.h"
#include "tree-flow.h"
#include "tree-dump.h"
#include "cfgloop.h"
#include "cfglayout.h"
#include "expr.h"
#include "recog.h"
#include "optabs.h"
#include "tree-vectorizer.h"
/* Recursively free the memory allocated for the SLP tree rooted at NODE. */
static void
vect_free_slp_tree (slp_tree node)
{
if (!node)
return;
if (SLP_TREE_LEFT (node))
vect_free_slp_tree (SLP_TREE_LEFT (node));
if (SLP_TREE_RIGHT (node))
vect_free_slp_tree (SLP_TREE_RIGHT (node));
VEC_free (gimple, heap, SLP_TREE_SCALAR_STMTS (node));
if (SLP_TREE_VEC_STMTS (node))
VEC_free (gimple, heap, SLP_TREE_VEC_STMTS (node));
free (node);
}
/* Free the memory allocated for the SLP instance. */
void
vect_free_slp_instance (slp_instance instance)
{
vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (instance));
VEC_free (slp_tree, heap, SLP_INSTANCE_LOADS (instance));
}
/* Get the defs for the rhs of STMT (collect them in DEF_STMTS0/1), check that
they are of a legal type and that they match the defs of the first stmt of
the SLP group (stored in FIRST_STMT_...). */
static bool
vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, slp_tree slp_node,
gimple stmt, VEC (gimple, heap) **def_stmts0,
VEC (gimple, heap) **def_stmts1,
enum vect_def_type *first_stmt_dt0,
enum vect_def_type *first_stmt_dt1,
tree *first_stmt_def0_type,
tree *first_stmt_def1_type,
tree *first_stmt_const_oprnd,
int ncopies_for_cost,
bool *pattern0, bool *pattern1)
{
tree oprnd;
unsigned int i, number_of_oprnds;
tree def;
gimple def_stmt;
enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
stmt_vec_info stmt_info =
vinfo_for_stmt (VEC_index (gimple, SLP_TREE_SCALAR_STMTS (slp_node), 0));
enum gimple_rhs_class rhs_class;
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
rhs_class = get_gimple_rhs_class (gimple_assign_rhs_code (stmt));
number_of_oprnds = gimple_num_ops (stmt) - 1; /* RHS only */
for (i = 0; i < number_of_oprnds; i++)
{
oprnd = gimple_op (stmt, i + 1);
if (!vect_is_simple_use (oprnd, loop_vinfo, &def_stmt, &def, &dt[i])
|| (!def_stmt && dt[i] != vect_constant_def))
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: can't find def for ");
print_generic_expr (vect_dump, oprnd, TDF_SLIM);
}
return false;
}
/* Check if DEF_STMT is a part of a pattern and get the def stmt from
the pattern. Check that all the stmts of the node are in the
pattern. */
if (def_stmt && gimple_bb (def_stmt)
&& flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))
&& vinfo_for_stmt (def_stmt)
&& STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)))
{
if (!*first_stmt_dt0)
*pattern0 = true;
else
{
if (i == 1 && !*first_stmt_dt1)
*pattern1 = true;
else if ((i == 0 && !*pattern0) || (i == 1 && !*pattern1))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "Build SLP failed: some of the stmts"
" are in a pattern, and others are not ");
print_generic_expr (vect_dump, oprnd, TDF_SLIM);
}
return false;
}
}
def_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
dt[i] = STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def_stmt));
if (*dt == vect_unknown_def_type)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "Unsupported pattern.");
return false;
}
switch (gimple_code (def_stmt))
{
case GIMPLE_PHI:
def = gimple_phi_result (def_stmt);
break;
case GIMPLE_ASSIGN:
def = gimple_assign_lhs (def_stmt);
break;
default:
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unsupported defining stmt: ");
return false;
}
}
if (!*first_stmt_dt0)
{
/* op0 of the first stmt of the group - store its info. */
*first_stmt_dt0 = dt[i];
if (def)
*first_stmt_def0_type = TREE_TYPE (def);
else
*first_stmt_const_oprnd = oprnd;
/* Analyze costs (for the first stmt of the group only). */
if (rhs_class != GIMPLE_SINGLE_RHS)
/* Not memory operation (we don't call this functions for loads). */
vect_model_simple_cost (stmt_info, ncopies_for_cost, dt, slp_node);
else
/* Store. */
vect_model_store_cost (stmt_info, ncopies_for_cost, dt[0], slp_node);
}
else
{
if (!*first_stmt_dt1 && i == 1)
{
/* op1 of the first stmt of the group - store its info. */
*first_stmt_dt1 = dt[i];
if (def)
*first_stmt_def1_type = TREE_TYPE (def);
else
{
/* We assume that the stmt contains only one constant
operand. We fail otherwise, to be on the safe side. */
if (*first_stmt_const_oprnd)
{
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "Build SLP failed: two constant "
"oprnds in stmt");
return false;
}
*first_stmt_const_oprnd = oprnd;
}
}
else
{
/* Not first stmt of the group, check that the def-stmt/s match
the def-stmt/s of the first stmt. */
if ((i == 0
&& (*first_stmt_dt0 != dt[i]
|| (*first_stmt_def0_type && def
&& *first_stmt_def0_type != TREE_TYPE (def))))
|| (i == 1
&& (*first_stmt_dt1 != dt[i]
|| (*first_stmt_def1_type && def
&& *first_stmt_def1_type != TREE_TYPE (def))))
|| (!def
&& TREE_TYPE (*first_stmt_const_oprnd)
!= TREE_TYPE (oprnd)))
{
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "Build SLP failed: different types ");
return false;
}
}
}
/* Check the types of the definitions. */
switch (dt[i])
{
case vect_constant_def:
case vect_invariant_def:
break;
case vect_loop_def:
if (i == 0)
VEC_safe_push (gimple, heap, *def_stmts0, def_stmt);
else
VEC_safe_push (gimple, heap, *def_stmts1, def_stmt);
break;
default:
/* FORNOW: Not supported. */
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: illegal type of def ");
print_generic_expr (vect_dump, def, TDF_SLIM);
}
return false;
}
}
return true;
}
/* Recursively build an SLP tree starting from NODE.
Fail (and return FALSE) if def-stmts are not isomorphic, require data
permutation or are of unsupported types of operation. Otherwise, return
TRUE. */
static bool
vect_build_slp_tree (loop_vec_info loop_vinfo, slp_tree *node,
unsigned int group_size,
int *inside_cost, int *outside_cost,
int ncopies_for_cost, unsigned int *max_nunits,
VEC (int, heap) **load_permutation,
VEC (slp_tree, heap) **loads)
{
VEC (gimple, heap) *def_stmts0 = VEC_alloc (gimple, heap, group_size);
VEC (gimple, heap) *def_stmts1 = VEC_alloc (gimple, heap, group_size);
unsigned int i;
VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (*node);
gimple stmt = VEC_index (gimple, stmts, 0);
enum vect_def_type first_stmt_dt0 = 0, first_stmt_dt1 = 0;
enum tree_code first_stmt_code = 0, rhs_code;
tree first_stmt_def1_type = NULL_TREE, first_stmt_def0_type = NULL_TREE;
tree lhs;
bool stop_recursion = false, need_same_oprnds = false;
tree vectype, scalar_type, first_op1 = NULL_TREE;
unsigned int vectorization_factor = 0, ncopies;
optab optab;
int icode;
enum machine_mode optab_op2_mode;
enum machine_mode vec_mode;
tree first_stmt_const_oprnd = NULL_TREE;
struct data_reference *first_dr;
bool pattern0 = false, pattern1 = false;
HOST_WIDE_INT dummy;
bool permutation = false;
unsigned int load_place;
gimple first_load;
/* For every stmt in NODE find its def stmt/s. */
for (i = 0; VEC_iterate (gimple, stmts, i, stmt); i++)
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP for ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
lhs = gimple_get_lhs (stmt);
if (lhs == NULL_TREE)
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump,
"Build SLP failed: not GIMPLE_ASSIGN nor GIMPLE_CALL");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);
vectype = get_vectype_for_scalar_type (scalar_type);
if (!vectype)
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: unsupported data-type ");
print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
}
return false;
}
gcc_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
ncopies = vectorization_factor / TYPE_VECTOR_SUBPARTS (vectype);
if (ncopies > 1 && vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "SLP with multiple types ");
/* In case of multiple types we need to detect the smallest type. */
if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
*max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
if (is_gimple_call (stmt))
rhs_code = CALL_EXPR;
else
rhs_code = gimple_assign_rhs_code (stmt);
/* Check the operation. */
if (i == 0)
{
first_stmt_code = rhs_code;
/* Shift arguments should be equal in all the packed stmts for a
vector shift with scalar shift operand. */
if (rhs_code == LSHIFT_EXPR || rhs_code == RSHIFT_EXPR
|| rhs_code == LROTATE_EXPR
|| rhs_code == RROTATE_EXPR)
{
vec_mode = TYPE_MODE (vectype);
/* First see if we have a vector/vector shift. */
optab = optab_for_tree_code (rhs_code, vectype,
optab_vector);
if (!optab
|| (optab->handlers[(int) vec_mode].insn_code
== CODE_FOR_nothing))
{
/* No vector/vector shift, try for a vector/scalar shift. */
optab = optab_for_tree_code (rhs_code, vectype,
optab_scalar);
if (!optab)
{
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "Build SLP failed: no optab.");
return false;
}
icode = (int) optab->handlers[(int) vec_mode].insn_code;
if (icode == CODE_FOR_nothing)
{
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "Build SLP failed: "
"op not supported by target.");
return false;
}
optab_op2_mode = insn_data[icode].operand[2].mode;
if (!VECTOR_MODE_P (optab_op2_mode))
{
need_same_oprnds = true;
first_op1 = gimple_assign_rhs2 (stmt);
}
}
}
}
else
{
if (first_stmt_code != rhs_code
&& (first_stmt_code != IMAGPART_EXPR
|| rhs_code != REALPART_EXPR)
&& (first_stmt_code != REALPART_EXPR
|| rhs_code != IMAGPART_EXPR))
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump,
"Build SLP failed: different operation in stmt ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
if (need_same_oprnds
&& !operand_equal_p (first_op1, gimple_assign_rhs2 (stmt), 0))
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump,
"Build SLP failed: different shift arguments in ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
}
/* Strided store or load. */
if (STMT_VINFO_STRIDED_ACCESS (vinfo_for_stmt (stmt)))
{
if (REFERENCE_CLASS_P (lhs))
{
/* Store. */
if (!vect_get_and_check_slp_defs (loop_vinfo, *node, stmt,
&def_stmts0, &def_stmts1,
&first_stmt_dt0,
&first_stmt_dt1,
&first_stmt_def0_type,
&first_stmt_def1_type,
&first_stmt_const_oprnd,
ncopies_for_cost,
&pattern0, &pattern1))
return false;
}
else
{
/* Load. */
/* FORNOW: Check that there is no gap between the loads. */
if ((DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) == stmt
&& DR_GROUP_GAP (vinfo_for_stmt (stmt)) != 0)
|| (DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) != stmt
&& DR_GROUP_GAP (vinfo_for_stmt (stmt)) != 1))
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: strided "
"loads have gaps ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
first_load = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt));
if (first_load == stmt)
{
first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
if (vect_supportable_dr_alignment (first_dr)
== dr_unaligned_unsupported)
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: unsupported "
"unaligned load ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
/* Analyze costs (for the first stmt in the group). */
vect_model_load_cost (vinfo_for_stmt (stmt),
ncopies_for_cost, *node);
}
/* Store the place of this load in the interleaving chain. In
case that permutation is needed we later decide if a specific
permutation is supported. */
load_place = vect_get_place_in_interleaving_chain (stmt,
first_load);
if (load_place != i)
permutation = true;
VEC_safe_push (int, heap, *load_permutation, load_place);
/* We stop the tree when we reach a group of loads. */
stop_recursion = true;
continue;
}
} /* Strided access. */
else
{
if (TREE_CODE_CLASS (rhs_code) == tcc_reference)
{
/* Not strided load. */
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: not strided load ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
/* FORNOW: Not strided loads are not supported. */
return false;
}
/* Not memory operation. */
if (TREE_CODE_CLASS (rhs_code) != tcc_binary
&& TREE_CODE_CLASS (rhs_code) != tcc_unary)
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: operation");
fprintf (vect_dump, " unsupported ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
/* Find the def-stmts. */
if (!vect_get_and_check_slp_defs (loop_vinfo, *node, stmt,
&def_stmts0, &def_stmts1,
&first_stmt_dt0, &first_stmt_dt1,
&first_stmt_def0_type,
&first_stmt_def1_type,
&first_stmt_const_oprnd,
ncopies_for_cost,
&pattern0, &pattern1))
return false;
}
}
/* Add the costs of the node to the overall instance costs. */
*inside_cost += SLP_TREE_INSIDE_OF_LOOP_COST (*node);
*outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node);
/* Strided loads were reached - stop the recursion. */
if (stop_recursion)
{
if (permutation)
{
VEC_safe_push (slp_tree, heap, *loads, *node);
*inside_cost += TARG_VEC_PERMUTE_COST * group_size;
}
return true;
}
/* Create SLP_TREE nodes for the definition node/s. */
if (first_stmt_dt0 == vect_loop_def)
{
slp_tree left_node = XNEW (struct _slp_tree);
SLP_TREE_SCALAR_STMTS (left_node) = def_stmts0;
SLP_TREE_VEC_STMTS (left_node) = NULL;
SLP_TREE_LEFT (left_node) = NULL;
SLP_TREE_RIGHT (left_node) = NULL;
SLP_TREE_OUTSIDE_OF_LOOP_COST (left_node) = 0;
SLP_TREE_INSIDE_OF_LOOP_COST (left_node) = 0;
if (!vect_build_slp_tree (loop_vinfo, &left_node, group_size,
inside_cost, outside_cost, ncopies_for_cost,
max_nunits, load_permutation, loads))
return false;
SLP_TREE_LEFT (*node) = left_node;
}
if (first_stmt_dt1 == vect_loop_def)
{
slp_tree right_node = XNEW (struct _slp_tree);
SLP_TREE_SCALAR_STMTS (right_node) = def_stmts1;
SLP_TREE_VEC_STMTS (right_node) = NULL;
SLP_TREE_LEFT (right_node) = NULL;
SLP_TREE_RIGHT (right_node) = NULL;
SLP_TREE_OUTSIDE_OF_LOOP_COST (right_node) = 0;
SLP_TREE_INSIDE_OF_LOOP_COST (right_node) = 0;
if (!vect_build_slp_tree (loop_vinfo, &right_node, group_size,
inside_cost, outside_cost, ncopies_for_cost,
max_nunits, load_permutation, loads))
return false;
SLP_TREE_RIGHT (*node) = right_node;
}
return true;
}
static void
vect_print_slp_tree (slp_tree node)
{
int i;
gimple stmt;
if (!node)
return;
fprintf (vect_dump, "node ");
for (i = 0; VEC_iterate (gimple, SLP_TREE_SCALAR_STMTS (node), i, stmt); i++)
{
fprintf (vect_dump, "\n\tstmt %d ", i);
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
fprintf (vect_dump, "\n");
vect_print_slp_tree (SLP_TREE_LEFT (node));
vect_print_slp_tree (SLP_TREE_RIGHT (node));
}
/* Mark the tree rooted at NODE with MARK (PURE_SLP or HYBRID).
If MARK is HYBRID, it refers to a specific stmt in NODE (the stmt at index
J). Otherwise, MARK is PURE_SLP and J is -1, which indicates that all the
stmts in NODE are to be marked. */
static void
vect_mark_slp_stmts (slp_tree node, enum slp_vect_type mark, int j)
{
int i;
gimple stmt;
if (!node)
return;
for (i = 0; VEC_iterate (gimple, SLP_TREE_SCALAR_STMTS (node), i, stmt); i++)
if (j < 0 || i == j)
STMT_SLP_TYPE (vinfo_for_stmt (stmt)) = mark;
vect_mark_slp_stmts (SLP_TREE_LEFT (node), mark, j);
vect_mark_slp_stmts (SLP_TREE_RIGHT (node), mark, j);
}
/* Check if the permutation required by the SLP INSTANCE is supported.
Reorganize the SLP nodes stored in SLP_INSTANCE_LOADS if needed. */
static bool
vect_supported_slp_permutation_p (slp_instance instance)
{
slp_tree node = VEC_index (slp_tree, SLP_INSTANCE_LOADS (instance), 0);
gimple stmt = VEC_index (gimple, SLP_TREE_SCALAR_STMTS (node), 0);
gimple first_load = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt));
VEC (slp_tree, heap) *sorted_loads = NULL;
int index;
slp_tree *tmp_loads = NULL;
int group_size = SLP_INSTANCE_GROUP_SIZE (instance), i, j;
slp_tree load;
/* FORNOW: The only supported loads permutation is loads from the same
location in all the loads in the node, when the data-refs in
nodes of LOADS constitute an interleaving chain.
Sort the nodes according to the order of accesses in the chain. */
tmp_loads = (slp_tree *) xmalloc (sizeof (slp_tree) * group_size);
for (i = 0, j = 0;
VEC_iterate (int, SLP_INSTANCE_LOAD_PERMUTATION (instance), i, index)
&& VEC_iterate (slp_tree, SLP_INSTANCE_LOADS (instance), j, load);
i += group_size, j++)
{
gimple scalar_stmt = VEC_index (gimple, SLP_TREE_SCALAR_STMTS (load), 0);
/* Check that the loads are all in the same interleaving chain. */
if (DR_GROUP_FIRST_DR (vinfo_for_stmt (scalar_stmt)) != first_load)
{
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "Build SLP failed: unsupported data "
"permutation ");
print_gimple_stmt (vect_dump, scalar_stmt, 0, TDF_SLIM);
}
free (tmp_loads);
return false;
}
tmp_loads[index] = load;
}
sorted_loads = VEC_alloc (slp_tree, heap, group_size);
for (i = 0; i < group_size; i++)
VEC_safe_push (slp_tree, heap, sorted_loads, tmp_loads[i]);
VEC_free (slp_tree, heap, SLP_INSTANCE_LOADS (instance));
SLP_INSTANCE_LOADS (instance) = sorted_loads;
free (tmp_loads);
if (!vect_transform_slp_perm_load (stmt, NULL, NULL,
SLP_INSTANCE_UNROLLING_FACTOR (instance),
instance, true))
return false;
return true;
}
/* Check if the required load permutation is supported.
LOAD_PERMUTATION contains a list of indices of the loads.
In SLP this permutation is relative to the order of strided stores that are
the base of the SLP instance. */
static bool
vect_supported_load_permutation_p (slp_instance slp_instn, int group_size,
VEC (int, heap) *load_permutation)
{
int i = 0, j, prev = -1, next, k;
bool supported;
/* FORNOW: permutations are only supported for loop-aware SLP. */
if (!slp_instn)
return false;
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Load permutation ");
for (i = 0; VEC_iterate (int, load_permutation, i, next); i++)
fprintf (vect_dump, "%d ", next);
}
/* FORNOW: the only supported permutation is 0..01..1.. of length equal to
GROUP_SIZE and where each sequence of same drs is of GROUP_SIZE length as
well. */
if (VEC_length (int, load_permutation)
!= (unsigned int) (group_size * group_size))
return false;
supported = true;
for (j = 0; j < group_size; j++)
{
for (i = j * group_size, k = 0;
VEC_iterate (int, load_permutation, i, next) && k < group_size;
i++, k++)
{
if (i != j * group_size && next != prev)
{
supported = false;
break;
}
prev = next;
}
}
if (supported && i == group_size * group_size
&& vect_supported_slp_permutation_p (slp_instn))
return true;
return false;
}
/* Find the first load in the loop that belongs to INSTANCE.
When loads are in several SLP nodes, there can be a case in which the first
load does not appear in the first SLP node to be transformed, causing
incorrect order of statements. Since we generate all the loads together,
they must be inserted before the first load of the SLP instance and not
before the first load of the first node of the instance. */
static gimple
vect_find_first_load_in_slp_instance (slp_instance instance)
{
int i, j;
slp_tree load_node;
gimple first_load = NULL, load;
for (i = 0;
VEC_iterate (slp_tree, SLP_INSTANCE_LOADS (instance), i, load_node);
i++)
for (j = 0;
VEC_iterate (gimple, SLP_TREE_SCALAR_STMTS (load_node), j, load);
j++)
first_load = get_earlier_stmt (load, first_load);
return first_load;
}
/* Analyze an SLP instance starting from a group of strided stores. Call
vect_build_slp_tree to build a tree of packed stmts if possible.
Return FALSE if it's impossible to SLP any stmt in the loop. */
static bool
vect_analyze_slp_instance (loop_vec_info loop_vinfo, gimple stmt)
{
slp_instance new_instance;
slp_tree node = XNEW (struct _slp_tree);
unsigned int group_size = DR_GROUP_SIZE (vinfo_for_stmt (stmt));
unsigned int unrolling_factor = 1, nunits;
tree vectype, scalar_type;
gimple next;
unsigned int vectorization_factor = 0, ncopies;
bool slp_impossible = false;
int inside_cost = 0, outside_cost = 0, ncopies_for_cost;
unsigned int max_nunits = 0;
VEC (int, heap) *load_permutation;
VEC (slp_tree, heap) *loads;
scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (
vinfo_for_stmt (stmt))));
vectype = get_vectype_for_scalar_type (scalar_type);
if (!vectype)
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: unsupported data-type ");
print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
}
return false;
}
nunits = TYPE_VECTOR_SUBPARTS (vectype);
vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
ncopies = vectorization_factor / nunits;
/* Create a node (a root of the SLP tree) for the packed strided stores. */
SLP_TREE_SCALAR_STMTS (node) = VEC_alloc (gimple, heap, group_size);
next = stmt;
/* Collect the stores and store them in SLP_TREE_SCALAR_STMTS. */
while (next)
{
VEC_safe_push (gimple, heap, SLP_TREE_SCALAR_STMTS (node), next);
next = DR_GROUP_NEXT_DR (vinfo_for_stmt (next));
}
SLP_TREE_VEC_STMTS (node) = NULL;
SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0;
SLP_TREE_LEFT (node) = NULL;
SLP_TREE_RIGHT (node) = NULL;
SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0;
SLP_TREE_INSIDE_OF_LOOP_COST (node) = 0;
/* Calculate the unrolling factor. */
unrolling_factor = least_common_multiple (nunits, group_size) / group_size;
/* Calculate the number of vector stmts to create based on the unrolling
factor (number of vectors is 1 if NUNITS >= GROUP_SIZE, and is
GROUP_SIZE / NUNITS otherwise. */
ncopies_for_cost = unrolling_factor * group_size / nunits;
load_permutation = VEC_alloc (int, heap, group_size * group_size);
loads = VEC_alloc (slp_tree, heap, group_size);
/* Build the tree for the SLP instance. */
if (vect_build_slp_tree (loop_vinfo, &node, group_size, &inside_cost,
&outside_cost, ncopies_for_cost, &max_nunits,
&load_permutation, &loads))
{
/* Create a new SLP instance. */
new_instance = XNEW (struct _slp_instance);
SLP_INSTANCE_TREE (new_instance) = node;
SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
/* Calculate the unrolling factor based on the smallest type in the
loop. */
if (max_nunits > nunits)
unrolling_factor = least_common_multiple (max_nunits, group_size)
/ group_size;
SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost;
SLP_INSTANCE_INSIDE_OF_LOOP_COST (new_instance) = inside_cost;
SLP_INSTANCE_LOADS (new_instance) = loads;
SLP_INSTANCE_FIRST_LOAD_STMT (new_instance) = NULL;
SLP_INSTANCE_LOAD_PERMUTATION (new_instance) = load_permutation;
if (VEC_length (slp_tree, loads))
{
if (!vect_supported_load_permutation_p (new_instance, group_size,
load_permutation))
{
if (vect_print_dump_info (REPORT_SLP))
{
fprintf (vect_dump, "Build SLP failed: unsupported load "
"permutation ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
vect_free_slp_instance (new_instance);
return false;
}
SLP_INSTANCE_FIRST_LOAD_STMT (new_instance)
= vect_find_first_load_in_slp_instance (new_instance);
}
else
VEC_free (int, heap, SLP_INSTANCE_LOAD_PERMUTATION (new_instance));
VEC_safe_push (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo),
new_instance);
if (vect_print_dump_info (REPORT_SLP))
vect_print_slp_tree (node);
return true;
}
/* Failed to SLP. */
/* Free the allocated memory. */
vect_free_slp_tree (node);
VEC_free (int, heap, load_permutation);
VEC_free (slp_tree, heap, loads);
if (slp_impossible)
return false;
/* SLP failed for this instance, but it is still possible to SLP other stmts
in the loop. */
return true;
}
/* Check if there are stmts in the loop can be vectorized using SLP. Build SLP
trees of packed scalar stmts if SLP is possible. */
bool
vect_analyze_slp (loop_vec_info loop_vinfo)
{
unsigned int i;
VEC (gimple, heap) *strided_stores = LOOP_VINFO_STRIDED_STORES (loop_vinfo);
gimple store;
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "=== vect_analyze_slp ===");
for (i = 0; VEC_iterate (gimple, strided_stores, i, store); i++)
if (!vect_analyze_slp_instance (loop_vinfo, store))
{
/* SLP failed. No instance can be SLPed in the loop. */
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "SLP failed.");
return false;
}
return true;
}
/* For each possible SLP instance decide whether to SLP it and calculate overall
unrolling factor needed to SLP the loop. */
void
vect_make_slp_decision (loop_vec_info loop_vinfo)
{
unsigned int i, unrolling_factor = 1;
VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
slp_instance instance;
int decided_to_slp = 0;
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "=== vect_make_slp_decision ===");
for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
{
/* FORNOW: SLP if you can. */
if (unrolling_factor < SLP_INSTANCE_UNROLLING_FACTOR (instance))
unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (instance);
/* Mark all the stmts that belong to INSTANCE as PURE_SLP stmts. Later we
call vect_detect_hybrid_slp () to find stmts that need hybrid SLP and
loop-based vectorization. Such stmts will be marked as HYBRID. */
vect_mark_slp_stmts (SLP_INSTANCE_TREE (instance), pure_slp, -1);
decided_to_slp++;
}
LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo) = unrolling_factor;
if (decided_to_slp && vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "Decided to SLP %d instances. Unrolling factor %d",
decided_to_slp, unrolling_factor);
}
/* Find stmts that must be both vectorized and SLPed (since they feed stmts that
can't be SLPed) in the tree rooted at NODE. Mark such stmts as HYBRID. */
static void
vect_detect_hybrid_slp_stmts (slp_tree node)
{
int i;
gimple stmt;
imm_use_iterator imm_iter;
gimple use_stmt;
if (!node)
return;
for (i = 0; VEC_iterate (gimple, SLP_TREE_SCALAR_STMTS (node), i, stmt); i++)
if (PURE_SLP_STMT (vinfo_for_stmt (stmt))
&& TREE_CODE (gimple_op (stmt, 0)) == SSA_NAME)
FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, gimple_op (stmt, 0))
if (vinfo_for_stmt (use_stmt)
&& !STMT_SLP_TYPE (vinfo_for_stmt (use_stmt))
&& STMT_VINFO_RELEVANT (vinfo_for_stmt (use_stmt)))
vect_mark_slp_stmts (node, hybrid, i);
vect_detect_hybrid_slp_stmts (SLP_TREE_LEFT (node));
vect_detect_hybrid_slp_stmts (SLP_TREE_RIGHT (node));
}
/* Find stmts that must be both vectorized and SLPed. */
void
vect_detect_hybrid_slp (loop_vec_info loop_vinfo)
{
unsigned int i;
VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
slp_instance instance;
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "=== vect_detect_hybrid_slp ===");
for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
vect_detect_hybrid_slp_stmts (SLP_INSTANCE_TREE (instance));
}
/* SLP costs are calculated according to SLP instance unrolling factor (i.e.,
the number of created vector stmts depends on the unrolling factor). However,
the actual number of vector stmts for every SLP node depends on VF which is
set later in vect_analyze_operations(). Hence, SLP costs should be updated.
In this function we assume that the inside costs calculated in
vect_model_xxx_cost are linear in ncopies. */
void
vect_update_slp_costs_according_to_vf (loop_vec_info loop_vinfo)
{
unsigned int i, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
slp_instance instance;
if (vect_print_dump_info (REPORT_SLP))
fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ===");
for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
/* We assume that costs are linear in ncopies. */
SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf
/ SLP_INSTANCE_UNROLLING_FACTOR (instance);
}
/* For constant and loop invariant defs of SLP_NODE this function returns
(vector) defs (VEC_OPRNDS) that will be used in the vectorized stmts.
OP_NUM determines if we gather defs for operand 0 or operand 1 of the scalar
stmts. NUMBER_OF_VECTORS is the number of vector defs to create. */
static void
vect_get_constant_vectors (slp_tree slp_node, VEC(tree,heap) **vec_oprnds,
unsigned int op_num, unsigned int number_of_vectors)
{
VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (slp_node);
gimple stmt = VEC_index (gimple, stmts, 0);
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
int nunits;
tree vec_cst;
tree t = NULL_TREE;
int j, number_of_places_left_in_vector;
tree vector_type;
tree op, vop;
int group_size = VEC_length (gimple, stmts);
unsigned int vec_num, i;
int number_of_copies = 1;
VEC (tree, heap) *voprnds = VEC_alloc (tree, heap, number_of_vectors);
bool constant_p, is_store;
if (STMT_VINFO_DATA_REF (stmt_vinfo))
{
is_store = true;
op = gimple_assign_rhs1 (stmt);
}
else
{
is_store = false;
op = gimple_op (stmt, op_num + 1);
}
if (CONSTANT_CLASS_P (op))
{
vector_type = vectype;
constant_p = true;
}
else
{
vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
gcc_assert (vector_type);
constant_p = false;
}
nunits = TYPE_VECTOR_SUBPARTS (vector_type);
/* NUMBER_OF_COPIES is the number of times we need to use the same values in
created vectors. It is greater than 1 if unrolling is performed.
For example, we have two scalar operands, s1 and s2 (e.g., group of
strided accesses of size two), while NUNITS is four (i.e., four scalars
of this type can be packed in a vector). The output vector will contain
two copies of each scalar operand: {s1, s2, s1, s2}. (NUMBER_OF_COPIES
will be 2).
If GROUP_SIZE > NUNITS, the scalars will be split into several vectors
containing the operands.
For example, NUNITS is four as before, and the group size is 8
(s1, s2, ..., s8). We will create two vectors {s1, s2, s3, s4} and
{s5, s6, s7, s8}. */
number_of_copies = least_common_multiple (nunits, group_size) / group_size;
number_of_places_left_in_vector = nunits;
for (j = 0; j < number_of_copies; j++)
{
for (i = group_size - 1; VEC_iterate (gimple, stmts, i, stmt); i--)
{
if (is_store)
op = gimple_assign_rhs1 (stmt);
else
op = gimple_op (stmt, op_num + 1);
/* Create 'vect_ = {op0,op1,...,opn}'. */
t = tree_cons (NULL_TREE, op, t);
number_of_places_left_in_vector--;
if (number_of_places_left_in_vector == 0)
{
number_of_places_left_in_vector = nunits;
if (constant_p)
vec_cst = build_vector (vector_type, t);
else
vec_cst = build_constructor_from_list (vector_type, t);
VEC_quick_push (tree, voprnds,
vect_init_vector (stmt, vec_cst, vector_type, NULL));
t = NULL_TREE;
}
}
}
/* Since the vectors are created in the reverse order, we should invert
them. */
vec_num = VEC_length (tree, voprnds);
for (j = vec_num - 1; j >= 0; j--)
{
vop = VEC_index (tree, voprnds, j);
VEC_quick_push (tree, *vec_oprnds, vop);
}
VEC_free (tree, heap, voprnds);
/* In case that VF is greater than the unrolling factor needed for the SLP
group of stmts, NUMBER_OF_VECTORS to be created is greater than
NUMBER_OF_SCALARS/NUNITS or NUNITS/NUMBER_OF_SCALARS, and hence we have
to replicate the vectors. */
while (number_of_vectors > VEC_length (tree, *vec_oprnds))
{
for (i = 0; VEC_iterate (tree, *vec_oprnds, i, vop) && i < vec_num; i++)
VEC_quick_push (tree, *vec_oprnds, vop);
}
}
/* Get vectorized definitions from SLP_NODE that contains corresponding
vectorized def-stmts. */
static void
vect_get_slp_vect_defs (slp_tree slp_node, VEC (tree,heap) **vec_oprnds)
{
tree vec_oprnd;
gimple vec_def_stmt;
unsigned int i;
gcc_assert (SLP_TREE_VEC_STMTS (slp_node));
for (i = 0;
VEC_iterate (gimple, SLP_TREE_VEC_STMTS (slp_node), i, vec_def_stmt);
i++)
{
gcc_assert (vec_def_stmt);
vec_oprnd = gimple_get_lhs (vec_def_stmt);
VEC_quick_push (tree, *vec_oprnds, vec_oprnd);
}
}
/* Get vectorized definitions for SLP_NODE.
If the scalar definitions are loop invariants or constants, collect them and
call vect_get_constant_vectors() to create vector stmts.
Otherwise, the def-stmts must be already vectorized and the vectorized stmts
must be stored in the LEFT/RIGHT node of SLP_NODE, and we call
vect_get_slp_vect_defs() to retrieve them.
If VEC_OPRNDS1 is NULL, don't get vector defs for the second operand (from
the right node. This is used when the second operand must remain scalar. */
void
vect_get_slp_defs (slp_tree slp_node, VEC (tree,heap) **vec_oprnds0,
VEC (tree,heap) **vec_oprnds1)
{
gimple first_stmt;
enum tree_code code;
int number_of_vects;
HOST_WIDE_INT lhs_size_unit, rhs_size_unit;
first_stmt = VEC_index (gimple, SLP_TREE_SCALAR_STMTS (slp_node), 0);
/* The number of vector defs is determined by the number of vector statements
in the node from which we get those statements. */
if (SLP_TREE_LEFT (slp_node))
number_of_vects = SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_LEFT (slp_node));
else
{
number_of_vects = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
/* Number of vector stmts was calculated according to LHS in
vect_schedule_slp_instance(), fix it by replacing LHS with RHS, if
necessary. See vect_get_smallest_scalar_type() for details. */
vect_get_smallest_scalar_type (first_stmt, &lhs_size_unit,
&rhs_size_unit);
if (rhs_size_unit != lhs_size_unit)
{
number_of_vects *= rhs_size_unit;
number_of_vects /= lhs_size_unit;
}
}
/* Allocate memory for vectorized defs. */
*vec_oprnds0 = VEC_alloc (tree, heap, number_of_vects);
/* SLP_NODE corresponds either to a group of stores or to a group of
unary/binary operations. We don't call this function for loads. */
if (SLP_TREE_LEFT (slp_node))
/* The defs are already vectorized. */
vect_get_slp_vect_defs (SLP_TREE_LEFT (slp_node), vec_oprnds0);
else
/* Build vectors from scalar defs. */
vect_get_constant_vectors (slp_node, vec_oprnds0, 0, number_of_vects);
if (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt)))
/* Since we don't call this function with loads, this is a group of
stores. */
return;
code = gimple_assign_rhs_code (first_stmt);
if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS || !vec_oprnds1)
return;
/* The number of vector defs is determined by the number of vector statements
in the node from which we get those statements. */
if (SLP_TREE_RIGHT (slp_node))
number_of_vects = SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_RIGHT (slp_node));
else
number_of_vects = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
*vec_oprnds1 = VEC_alloc (tree, heap, number_of_vects);
if (SLP_TREE_RIGHT (slp_node))
/* The defs are already vectorized. */
vect_get_slp_vect_defs (SLP_TREE_RIGHT (slp_node), vec_oprnds1);
else
/* Build vectors from scalar defs. */
vect_get_constant_vectors (slp_node, vec_oprnds1, 1, number_of_vects);
}
/* Create NCOPIES permutation statements using the mask MASK_BYTES (by
building a vector of type MASK_TYPE from it) and two input vectors placed in
DR_CHAIN at FIRST_VEC_INDX and SECOND_VEC_INDX for the first copy and
shifting by STRIDE elements of DR_CHAIN for every copy.
(STRIDE is the number of vectorized stmts for NODE divided by the number of
copies).
VECT_STMTS_COUNTER specifies the index in the vectorized stmts of NODE, where
the created stmts must be inserted. */
static inline void
vect_create_mask_and_perm (gimple stmt, gimple next_scalar_stmt,
int *mask_array, int mask_nunits,
tree mask_element_type, tree mask_type,
int first_vec_indx, int second_vec_indx,
gimple_stmt_iterator *gsi, slp_tree node,
tree builtin_decl, tree vectype,
VEC(tree,heap) *dr_chain,
int ncopies, int vect_stmts_counter)
{
tree t = NULL_TREE, mask_vec, mask, perm_dest;
gimple perm_stmt = NULL;
stmt_vec_info next_stmt_info;
int i, group_size, stride, dr_chain_size;
tree first_vec, second_vec, data_ref;
tree sym;
ssa_op_iter iter;
VEC (tree, heap) *params = NULL;
/* Create a vector mask. */
for (i = mask_nunits - 1; i >= 0; --i)
t = tree_cons (NULL_TREE, build_int_cst (mask_element_type, mask_array[i]),
t);
mask_vec = build_vector (mask_type, t);
mask = vect_init_vector (stmt, mask_vec, mask_type, NULL);
group_size = VEC_length (gimple, SLP_TREE_SCALAR_STMTS (node));
stride = SLP_TREE_NUMBER_OF_VEC_STMTS (node) / ncopies;
dr_chain_size = VEC_length (tree, dr_chain);
/* Initialize the vect stmts of NODE to properly insert the generated
stmts later. */
for (i = VEC_length (gimple, SLP_TREE_VEC_STMTS (node));
i < (int) SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (node), NULL);
perm_dest = vect_create_destination_var (gimple_assign_lhs (stmt), vectype);
for (i = 0; i < ncopies; i++)
{
first_vec = VEC_index (tree, dr_chain, first_vec_indx);
second_vec = VEC_index (tree, dr_chain, second_vec_indx);
/* Build argument list for the vectorized call. */
VEC_free (tree, heap, params);
params = VEC_alloc (tree, heap, 3);
VEC_quick_push (tree, params, first_vec);
VEC_quick_push (tree, params, second_vec);
VEC_quick_push (tree, params, mask);
/* Generate the permute statement. */
perm_stmt = gimple_build_call_vec (builtin_decl, params);
data_ref = make_ssa_name (perm_dest, perm_stmt);
gimple_call_set_lhs (perm_stmt, data_ref);
vect_finish_stmt_generation (stmt, perm_stmt, gsi);
FOR_EACH_SSA_TREE_OPERAND (sym, perm_stmt, iter, SSA_OP_ALL_VIRTUALS)
{
if (TREE_CODE (sym) == SSA_NAME)
sym = SSA_NAME_VAR (sym);
mark_sym_for_renaming (sym);
}
/* Store the vector statement in NODE. */
VEC_replace (gimple, SLP_TREE_VEC_STMTS (node),
stride * i + vect_stmts_counter, perm_stmt);
first_vec_indx += stride;
second_vec_indx += stride;
}
/* Mark the scalar stmt as vectorized. */
next_stmt_info = vinfo_for_stmt (next_scalar_stmt);
STMT_VINFO_VEC_STMT (next_stmt_info) = perm_stmt;
}
/* Given FIRST_MASK_ELEMENT - the mask element in element representation,
return in CURRENT_MASK_ELEMENT its equivalent in target specific
representation. Check that the mask is valid and return FALSE if not.
Return TRUE in NEED_NEXT_VECTOR if the permutation requires to move to
the next vector, i.e., the current first vector is not needed. */
static bool
vect_get_mask_element (gimple stmt, int first_mask_element, int m,
int mask_nunits, bool only_one_vec, int index,
int *mask, int *current_mask_element,
bool *need_next_vector)
{
int i;
static int number_of_mask_fixes = 1;
static bool mask_fixed = false;
static bool needs_first_vector = false;
/* Convert to target specific representation. */
*current_mask_element = first_mask_element + m;
/* Adjust the value in case it's a mask for second and third vectors. */
*current_mask_element -= mask_nunits * (number_of_mask_fixes - 1);
if (*current_mask_element < mask_nunits)
needs_first_vector = true;
/* We have only one input vector to permute but the mask accesses values in
the next vector as well. */
if (only_one_vec && *current_mask_element >= mask_nunits)
{
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "permutation requires at least two vectors ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
/* The mask requires the next vector. */
if (*current_mask_element >= mask_nunits * 2)
{
if (needs_first_vector || mask_fixed)
{
/* We either need the first vector too or have already moved to the
next vector. In both cases, this permutation needs three
vectors. */
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "permutation requires at "
"least three vectors ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
/* We move to the next vector, dropping the first one and working with
the second and the third - we need to adjust the values of the mask
accordingly. */
*current_mask_element -= mask_nunits * number_of_mask_fixes;
for (i = 0; i < index; i++)
mask[i] -= mask_nunits * number_of_mask_fixes;
(number_of_mask_fixes)++;
mask_fixed = true;
}
*need_next_vector = mask_fixed;
/* This was the last element of this mask. Start a new one. */
if (index == mask_nunits - 1)
{
number_of_mask_fixes = 1;
mask_fixed = false;
needs_first_vector = false;
}
return true;
}
/* Generate vector permute statements from a list of loads in DR_CHAIN.
If ANALYZE_ONLY is TRUE, only check that it is possible to create valid
permute statements for SLP_NODE_INSTANCE. */
bool
vect_transform_slp_perm_load (gimple stmt, VEC (tree, heap) *dr_chain,
gimple_stmt_iterator *gsi, int vf,
slp_instance slp_node_instance, bool analyze_only)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree mask_element_type = NULL_TREE, mask_type;
int i, j, k, m, scale, mask_nunits, nunits, vec_index = 0, scalar_index;
slp_tree node;
tree vectype = STMT_VINFO_VECTYPE (stmt_info), builtin_decl;
gimple next_scalar_stmt;
int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
int first_mask_element;
int index, unroll_factor, *mask, current_mask_element, ncopies;
bool only_one_vec = false, need_next_vector = false;
int first_vec_index, second_vec_index, orig_vec_stmts_num, vect_stmts_counter;
if (!targetm.vectorize.builtin_vec_perm)
{
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "no builtin for vect permute for ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
builtin_decl = targetm.vectorize.builtin_vec_perm (vectype,
&mask_element_type);
if (!builtin_decl || !mask_element_type)
{
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "no builtin for vect permute for ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
return false;
}
mask_type = get_vectype_for_scalar_type (mask_element_type);
mask_nunits = TYPE_VECTOR_SUBPARTS (mask_type);
mask = (int *) xmalloc (sizeof (int) * mask_nunits);
nunits = TYPE_VECTOR_SUBPARTS (vectype);
scale = mask_nunits / nunits;
unroll_factor = SLP_INSTANCE_UNROLLING_FACTOR (slp_node_instance);
/* The number of vector stmts to generate based only on SLP_NODE_INSTANCE
unrolling factor. */
orig_vec_stmts_num = group_size *
SLP_INSTANCE_UNROLLING_FACTOR (slp_node_instance) / nunits;
if (orig_vec_stmts_num == 1)
only_one_vec = true;
/* Number of copies is determined by the final vectorization factor
relatively to SLP_NODE_INSTANCE unrolling factor. */
ncopies = vf / SLP_INSTANCE_UNROLLING_FACTOR (slp_node_instance);
/* Generate permutation masks for every NODE. Number of masks for each NODE
is equal to GROUP_SIZE.
E.g., we have a group of three nodes with three loads from the same
location in each node, and the vector size is 4. I.e., we have a
a0b0c0a1b1c1... sequence and we need to create the following vectors:
for a's: a0a0a0a1 a1a1a2a2 a2a3a3a3
for b's: b0b0b0b1 b1b1b2b2 b2b3b3b3
...
The masks for a's should be: {0,0,0,3} {3,3,6,6} {6,9,9,9} (in target
scpecific type, e.g., in bytes for Altivec.
The last mask is illegal since we assume two operands for permute
operation, and the mask element values can't be outside that range. Hence,
the last mask must be converted into {2,5,5,5}.
For the first two permutations we need the first and the second input
vectors: {a0,b0,c0,a1} and {b1,c1,a2,b2}, and for the last permutation
we need the second and the third vectors: {b1,c1,a2,b2} and
{c2,a3,b3,c3}. */
for (i = 0;
VEC_iterate (slp_tree, SLP_INSTANCE_LOADS (slp_node_instance),
i, node);
i++)
{
scalar_index = 0;
index = 0;
vect_stmts_counter = 0;
vec_index = 0;
first_vec_index = vec_index++;
if (only_one_vec)
second_vec_index = first_vec_index;
else
second_vec_index = vec_index++;
for (j = 0; j < unroll_factor; j++)
{
for (k = 0; k < group_size; k++)
{
first_mask_element = (i + j * group_size) * scale;
for (m = 0; m < scale; m++)
{
if (!vect_get_mask_element (stmt, first_mask_element, m,
mask_nunits, only_one_vec, index, mask,
&current_mask_element, &need_next_vector))
return false;
mask[index++] = current_mask_element;
}
if (index == mask_nunits)
{
index = 0;
if (!analyze_only)
{
if (need_next_vector)
{
first_vec_index = second_vec_index;
second_vec_index = vec_index;
}
next_scalar_stmt = VEC_index (gimple,
SLP_TREE_SCALAR_STMTS (node), scalar_index++);
vect_create_mask_and_perm (stmt, next_scalar_stmt,
mask, mask_nunits, mask_element_type, mask_type,
first_vec_index, second_vec_index, gsi, node,
builtin_decl, vectype, dr_chain, ncopies,
vect_stmts_counter++);
}
}
}
}
}
free (mask);
return true;
}
/* Vectorize SLP instance tree in postorder. */
static bool
vect_schedule_slp_instance (slp_tree node, slp_instance instance,
unsigned int vectorization_factor)
{
gimple stmt;
bool strided_store, is_store;
gimple_stmt_iterator si;
stmt_vec_info stmt_info;
unsigned int vec_stmts_size, nunits, group_size;
tree vectype;
int i;
slp_tree loads_node;
if (!node)
return false;
vect_schedule_slp_instance (SLP_TREE_LEFT (node), instance,
vectorization_factor);
vect_schedule_slp_instance (SLP_TREE_RIGHT (node), instance,
vectorization_factor);
stmt = VEC_index (gimple, SLP_TREE_SCALAR_STMTS (node), 0);
stmt_info = vinfo_for_stmt (stmt);
/* VECTYPE is the type of the destination. */
vectype = get_vectype_for_scalar_type (TREE_TYPE (gimple_assign_lhs (stmt)));
nunits = (unsigned int) TYPE_VECTOR_SUBPARTS (vectype);
group_size = SLP_INSTANCE_GROUP_SIZE (instance);
/* For each SLP instance calculate number of vector stmts to be created
for the scalar stmts in each node of the SLP tree. Number of vector
elements in one vector iteration is the number of scalar elements in
one scalar iteration (GROUP_SIZE) multiplied by VF divided by vector
size. */
vec_stmts_size = (vectorization_factor * group_size) / nunits;
/* In case of load permutation we have to allocate vectorized statements for
all the nodes that participate in that permutation. */
if (SLP_INSTANCE_LOAD_PERMUTATION (instance))
{
for (i = 0;
VEC_iterate (slp_tree, SLP_INSTANCE_LOADS (instance), i, loads_node);
i++)
{
if (!SLP_TREE_VEC_STMTS (loads_node))
{
SLP_TREE_VEC_STMTS (loads_node) = VEC_alloc (gimple, heap,
vec_stmts_size);
SLP_TREE_NUMBER_OF_VEC_STMTS (loads_node) = vec_stmts_size;
}
}
}
if (!SLP_TREE_VEC_STMTS (node))
{
SLP_TREE_VEC_STMTS (node) = VEC_alloc (gimple, heap, vec_stmts_size);
SLP_TREE_NUMBER_OF_VEC_STMTS (node) = vec_stmts_size;
}
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "------>vectorizing SLP node starting from: ");
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
/* Loads should be inserted before the first load. */
if (SLP_INSTANCE_FIRST_LOAD_STMT (instance)
&& STMT_VINFO_STRIDED_ACCESS (stmt_info)
&& !REFERENCE_CLASS_P (gimple_get_lhs (stmt)))
si = gsi_for_stmt (SLP_INSTANCE_FIRST_LOAD_STMT (instance));
else
si = gsi_for_stmt (stmt);
is_store = vect_transform_stmt (stmt, &si, &strided_store, node, instance);
if (is_store)
{
if (DR_GROUP_FIRST_DR (stmt_info))
/* If IS_STORE is TRUE, the vectorization of the
interleaving chain was completed - free all the stores in
the chain. */
vect_remove_stores (DR_GROUP_FIRST_DR (stmt_info));
else
/* FORNOW: SLP originates only from strided stores. */
gcc_unreachable ();
return true;
}
/* FORNOW: SLP originates only from strided stores. */
return false;
}
bool
vect_schedule_slp (loop_vec_info loop_vinfo)
{
VEC (slp_instance, heap) *slp_instances =
LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
slp_instance instance;
unsigned int i;
bool is_store = false;
for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
{
/* Schedule the tree of INSTANCE. */
is_store = vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
instance, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS)
|| vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "vectorizing stmts using SLP.");
}
return is_store;
}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
/* Loop Vectorization /* Vectorizer
Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008 Free Software Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software
Foundation, Inc. Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com> Contributed by Dorit Naishlos <dorit@il.ibm.com>
This file is part of GCC. This file is part of GCC.
...@@ -19,2754 +19,147 @@ You should have received a copy of the GNU General Public License ...@@ -19,2754 +19,147 @@ You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */ <http://www.gnu.org/licenses/>. */
/* Loop Vectorization Pass. /* Loop and basic block vectorizer.
This pass tries to vectorize loops. This first implementation focuses on
simple inner-most loops, with no conditional control flow, and a set of
simple operations which vector form can be expressed using existing
tree codes (PLUS, MULT etc).
For example, the vectorizer transforms the following simple loop:
short a[N]; short b[N]; short c[N]; int i;
for (i=0; i<N; i++){
a[i] = b[i] + c[i];
}
as if it was manually vectorized by rewriting the source code into:
typedef int __attribute__((mode(V8HI))) v8hi;
short a[N]; short b[N]; short c[N]; int i;
v8hi *pa = (v8hi*)a, *pb = (v8hi*)b, *pc = (v8hi*)c;
v8hi va, vb, vc;
for (i=0; i<N/8; i++){
vb = pb[i];
vc = pc[i];
va = vb + vc;
pa[i] = va;
}
The main entry to this pass is vectorize_loops(), in which
the vectorizer applies a set of analyses on a given set of loops,
followed by the actual vectorization transformation for the loops that
had successfully passed the analysis phase.
Throughout this pass we make a distinction between two types of
data: scalars (which are represented by SSA_NAMES), and memory references
("data-refs"). These two types of data require different handling both
during analysis and transformation. The types of data-refs that the
vectorizer currently supports are ARRAY_REFS which base is an array DECL
(not a pointer), and INDIRECT_REFS through pointers; both array and pointer
accesses are required to have a simple (consecutive) access pattern.
Analysis phase:
===============
The driver for the analysis phase is vect_analyze_loop_nest().
It applies a set of analyses, some of which rely on the scalar evolution
analyzer (scev) developed by Sebastian Pop.
During the analysis phase the vectorizer records some information
per stmt in a "stmt_vec_info" struct which is attached to each stmt in the
loop, as well as general information about the loop as a whole, which is
recorded in a "loop_vec_info" struct attached to each loop.
Transformation phase:
=====================
The loop transformation phase scans all the stmts in the loop, and
creates a vector stmt (or a sequence of stmts) for each scalar stmt S in
the loop that needs to be vectorized. It insert the vector code sequence
just before the scalar stmt S, and records a pointer to the vector code
in STMT_VINFO_VEC_STMT (stmt_info) (stmt_info is the stmt_vec_info struct
attached to S). This pointer will be used for the vectorization of following
stmts which use the def of stmt S. Stmt S is removed if it writes to memory;
otherwise, we rely on dead code elimination for removing it.
For example, say stmt S1 was vectorized into stmt VS1:
VS1: vb = px[i];
S1: b = x[i]; STMT_VINFO_VEC_STMT (stmt_info (S1)) = VS1
S2: a = b;
To vectorize stmt S2, the vectorizer first finds the stmt that defines
the operand 'b' (S1), and gets the relevant vector def 'vb' from the
vector stmt VS1 pointed to by STMT_VINFO_VEC_STMT (stmt_info (S1)). The
resulting sequence would be:
VS1: vb = px[i];
S1: b = x[i]; STMT_VINFO_VEC_STMT (stmt_info (S1)) = VS1
VS2: va = vb;
S2: a = b; STMT_VINFO_VEC_STMT (stmt_info (S2)) = VS2
Operands that are not SSA_NAMEs, are data-refs that appear in
load/store operations (like 'x[i]' in S1), and are handled differently.
Target modeling:
=================
Currently the only target specific information that is used is the
size of the vector (in bytes) - "UNITS_PER_SIMD_WORD". Targets that can
support different sizes of vectors, for now will need to specify one value
for "UNITS_PER_SIMD_WORD". More flexibility will be added in the future.
Since we only vectorize operations which vector form can be
expressed using existing tree codes, to verify that an operation is
supported, the vectorizer checks the relevant optab at the relevant
machine_mode (e.g, optab_handler (add_optab, V8HImode)->insn_code). If
the value found is CODE_FOR_nothing, then there's no target support, and
we can't vectorize the stmt.
For additional information on this project see:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
*/
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "ggc.h"
#include "tree.h"
#include "target.h"
#include "rtl.h"
#include "basic-block.h"
#include "diagnostic.h"
#include "tree-flow.h"
#include "tree-dump.h"
#include "timevar.h"
#include "cfgloop.h"
#include "cfglayout.h"
#include "expr.h"
#include "recog.h"
#include "optabs.h"
#include "params.h"
#include "toplev.h"
#include "tree-chrec.h"
#include "tree-data-ref.h"
#include "tree-scalar-evolution.h"
#include "input.h"
#include "hashtab.h"
#include "tree-vectorizer.h"
#include "tree-pass.h"
#include "langhooks.h"
/*************************************************************************
General Vectorization Utilities
*************************************************************************/
/* vect_dump will be set to stderr or dump_file if exist. */
FILE *vect_dump;
/* vect_verbosity_level set to an invalid value
to mark that it's uninitialized. */
enum verbosity_levels vect_verbosity_level = MAX_VERBOSITY_LEVEL;
/* Loop location. */
static LOC vect_loop_location;
/* Bitmap of virtual variables to be renamed. */
bitmap vect_memsyms_to_rename;
/* Vector mapping GIMPLE stmt to stmt_vec_info. */
VEC(vec_void_p,heap) *stmt_vec_info_vec;
/*************************************************************************
Simple Loop Peeling Utilities
Utilities to support loop peeling for vectorization purposes.
*************************************************************************/
/* Renames the use *OP_P. */
static void
rename_use_op (use_operand_p op_p)
{
tree new_name;
if (TREE_CODE (USE_FROM_PTR (op_p)) != SSA_NAME)
return;
new_name = get_current_def (USE_FROM_PTR (op_p));
/* Something defined outside of the loop. */
if (!new_name)
return;
/* An ordinary ssa name defined in the loop. */
SET_USE (op_p, new_name);
}
/* Renames the variables in basic block BB. */
void
rename_variables_in_bb (basic_block bb)
{
gimple_stmt_iterator gsi;
gimple stmt;
use_operand_p use_p;
ssa_op_iter iter;
edge e;
edge_iterator ei;
struct loop *loop = bb->loop_father;
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
stmt = gsi_stmt (gsi);
FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
rename_use_op (use_p);
}
FOR_EACH_EDGE (e, ei, bb->succs)
{
if (!flow_bb_inside_loop_p (loop, e->dest))
continue;
for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi_stmt (gsi), e));
}
}
/* Renames variables in new generated LOOP. */
void
rename_variables_in_loop (struct loop *loop)
{
unsigned i;
basic_block *bbs;
bbs = get_loop_body (loop);
for (i = 0; i < loop->num_nodes; i++)
rename_variables_in_bb (bbs[i]);
free (bbs);
}
/* Update the PHI nodes of NEW_LOOP.
NEW_LOOP is a duplicate of ORIG_LOOP.
AFTER indicates whether NEW_LOOP executes before or after ORIG_LOOP:
AFTER is true if NEW_LOOP executes after ORIG_LOOP, and false if it
executes before it. */
static void
slpeel_update_phis_for_duplicate_loop (struct loop *orig_loop,
struct loop *new_loop, bool after)
{
tree new_ssa_name;
gimple phi_new, phi_orig;
tree def;
edge orig_loop_latch = loop_latch_edge (orig_loop);
edge orig_entry_e = loop_preheader_edge (orig_loop);
edge new_loop_exit_e = single_exit (new_loop);
edge new_loop_entry_e = loop_preheader_edge (new_loop);
edge entry_arg_e = (after ? orig_loop_latch : orig_entry_e);
gimple_stmt_iterator gsi_new, gsi_orig;
/*
step 1. For each loop-header-phi:
Add the first phi argument for the phi in NEW_LOOP
(the one associated with the entry of NEW_LOOP)
step 2. For each loop-header-phi:
Add the second phi argument for the phi in NEW_LOOP
(the one associated with the latch of NEW_LOOP)
step 3. Update the phis in the successor block of NEW_LOOP.
case 1: NEW_LOOP was placed before ORIG_LOOP:
The successor block of NEW_LOOP is the header of ORIG_LOOP.
Updating the phis in the successor block can therefore be done
along with the scanning of the loop header phis, because the
header blocks of ORIG_LOOP and NEW_LOOP have exactly the same
phi nodes, organized in the same order.
case 2: NEW_LOOP was placed after ORIG_LOOP:
The successor block of NEW_LOOP is the original exit block of
ORIG_LOOP - the phis to be updated are the loop-closed-ssa phis.
We postpone updating these phis to a later stage (when
loop guards are added).
*/
/* Scan the phis in the headers of the old and new loops
(they are organized in exactly the same order). */
for (gsi_new = gsi_start_phis (new_loop->header),
gsi_orig = gsi_start_phis (orig_loop->header);
!gsi_end_p (gsi_new) && !gsi_end_p (gsi_orig);
gsi_next (&gsi_new), gsi_next (&gsi_orig))
{
phi_new = gsi_stmt (gsi_new);
phi_orig = gsi_stmt (gsi_orig);
/* step 1. */
def = PHI_ARG_DEF_FROM_EDGE (phi_orig, entry_arg_e);
add_phi_arg (phi_new, def, new_loop_entry_e);
/* step 2. */
def = PHI_ARG_DEF_FROM_EDGE (phi_orig, orig_loop_latch);
if (TREE_CODE (def) != SSA_NAME)
continue;
new_ssa_name = get_current_def (def);
if (!new_ssa_name)
{
/* This only happens if there are no definitions
inside the loop. use the phi_result in this case. */
new_ssa_name = PHI_RESULT (phi_new);
}
/* An ordinary ssa name defined in the loop. */
add_phi_arg (phi_new, new_ssa_name, loop_latch_edge (new_loop));
/* step 3 (case 1). */
if (!after)
{
gcc_assert (new_loop_exit_e == orig_entry_e);
SET_PHI_ARG_DEF (phi_orig,
new_loop_exit_e->dest_idx,
new_ssa_name);
}
}
}
/* Update PHI nodes for a guard of the LOOP.
Input:
- LOOP, GUARD_EDGE: LOOP is a loop for which we added guard code that
controls whether LOOP is to be executed. GUARD_EDGE is the edge that
originates from the guard-bb, skips LOOP and reaches the (unique) exit
bb of LOOP. This loop-exit-bb is an empty bb with one successor.
We denote this bb NEW_MERGE_BB because before the guard code was added
it had a single predecessor (the LOOP header), and now it became a merge
point of two paths - the path that ends with the LOOP exit-edge, and
the path that ends with GUARD_EDGE.
- NEW_EXIT_BB: New basic block that is added by this function between LOOP
and NEW_MERGE_BB. It is used to place loop-closed-ssa-form exit-phis.
===> The CFG before the guard-code was added:
LOOP_header_bb:
loop_body
if (exit_loop) goto update_bb
else goto LOOP_header_bb
update_bb:
==> The CFG after the guard-code was added:
guard_bb:
if (LOOP_guard_condition) goto new_merge_bb
else goto LOOP_header_bb
LOOP_header_bb:
loop_body
if (exit_loop_condition) goto new_merge_bb
else goto LOOP_header_bb
new_merge_bb:
goto update_bb
update_bb:
==> The CFG after this function:
guard_bb:
if (LOOP_guard_condition) goto new_merge_bb
else goto LOOP_header_bb
LOOP_header_bb:
loop_body
if (exit_loop_condition) goto new_exit_bb
else goto LOOP_header_bb
new_exit_bb:
new_merge_bb:
goto update_bb
update_bb:
This function:
1. creates and updates the relevant phi nodes to account for the new
incoming edge (GUARD_EDGE) into NEW_MERGE_BB. This involves:
1.1. Create phi nodes at NEW_MERGE_BB.
1.2. Update the phi nodes at the successor of NEW_MERGE_BB (denoted
UPDATE_BB). UPDATE_BB was the exit-bb of LOOP before NEW_MERGE_BB
2. preserves loop-closed-ssa-form by creating the required phi nodes
at the exit of LOOP (i.e, in NEW_EXIT_BB).
There are two flavors to this function:
slpeel_update_phi_nodes_for_guard1:
Here the guard controls whether we enter or skip LOOP, where LOOP is a
prolog_loop (loop1 below), and the new phis created in NEW_MERGE_BB are
for variables that have phis in the loop header.
slpeel_update_phi_nodes_for_guard2:
Here the guard controls whether we enter or skip LOOP, where LOOP is an
epilog_loop (loop2 below), and the new phis created in NEW_MERGE_BB are
for variables that have phis in the loop exit.
I.E., the overall structure is:
loop1_preheader_bb:
guard1 (goto loop1/merge1_bb)
loop1
loop1_exit_bb:
guard2 (goto merge1_bb/merge2_bb)
merge1_bb
loop2
loop2_exit_bb
merge2_bb
next_bb
slpeel_update_phi_nodes_for_guard1 takes care of creating phis in
loop1_exit_bb and merge1_bb. These are entry phis (phis for the vars
that have phis in loop1->header).
slpeel_update_phi_nodes_for_guard2 takes care of creating phis in
loop2_exit_bb and merge2_bb. These are exit phis (phis for the vars
that have phis in next_bb). It also adds some of these phis to
loop1_exit_bb.
slpeel_update_phi_nodes_for_guard1 is always called before
slpeel_update_phi_nodes_for_guard2. They are both needed in order
to create correct data-flow and loop-closed-ssa-form.
Generally slpeel_update_phi_nodes_for_guard1 creates phis for variables
that change between iterations of a loop (and therefore have a phi-node
at the loop entry), whereas slpeel_update_phi_nodes_for_guard2 creates
phis for variables that are used out of the loop (and therefore have
loop-closed exit phis). Some variables may be both updated between
iterations and used after the loop. This is why in loop1_exit_bb we
may need both entry_phis (created by slpeel_update_phi_nodes_for_guard1)
and exit phis (created by slpeel_update_phi_nodes_for_guard2).
- IS_NEW_LOOP: if IS_NEW_LOOP is true, then LOOP is a newly created copy of
an original loop. i.e., we have:
orig_loop
guard_bb (goto LOOP/new_merge)
new_loop <-- LOOP
new_exit
new_merge
next_bb
If IS_NEW_LOOP is false, then LOOP is an original loop, in which case we
have:
new_loop
guard_bb (goto LOOP/new_merge)
orig_loop <-- LOOP
new_exit
new_merge
next_bb
The SSA names defined in the original loop have a current
reaching definition that that records the corresponding new
ssa-name used in the new duplicated loop copy.
*/
/* Function slpeel_update_phi_nodes_for_guard1
Input:
- GUARD_EDGE, LOOP, IS_NEW_LOOP, NEW_EXIT_BB - as explained above.
- DEFS - a bitmap of ssa names to mark new names for which we recorded
information.
In the context of the overall structure, we have:
loop1_preheader_bb:
guard1 (goto loop1/merge1_bb)
LOOP-> loop1
loop1_exit_bb:
guard2 (goto merge1_bb/merge2_bb)
merge1_bb
loop2
loop2_exit_bb
merge2_bb
next_bb
For each name updated between loop iterations (i.e - for each name that has
an entry (loop-header) phi in LOOP) we create a new phi in:
1. merge1_bb (to account for the edge from guard1)
2. loop1_exit_bb (an exit-phi to keep LOOP in loop-closed form)
*/
static void
slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
bool is_new_loop, basic_block *new_exit_bb,
bitmap *defs)
{
gimple orig_phi, new_phi;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge->dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e->dest;
basic_block orig_bb = loop->header;
edge new_exit_e;
tree current_new_name;
tree name;
gimple_stmt_iterator gsi_orig, gsi_update;
/* Create new bb between loop and new_merge_bb. */
*new_exit_bb = split_edge (single_exit (loop));
new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
for (gsi_orig = gsi_start_phis (orig_bb),
gsi_update = gsi_start_phis (update_bb);
!gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
gsi_next (&gsi_orig), gsi_next (&gsi_update))
{
orig_phi = gsi_stmt (gsi_orig);
update_phi = gsi_stmt (gsi_update);
/* Virtual phi; Mark it for renaming. We actually want to call
mar_sym_for_renaming, but since all ssa renaming datastructures
are going to be freed before we get to call ssa_update, we just
record this name for now in a bitmap, and will mark it for
renaming later. */
name = PHI_RESULT (orig_phi);
if (!is_gimple_reg (SSA_NAME_VAR (name)))
bitmap_set_bit (vect_memsyms_to_rename, DECL_UID (SSA_NAME_VAR (name)));
/** 1. Handle new-merge-point phis **/
/* 1.1. Generate new phi node in NEW_MERGE_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
new_merge_bb);
/* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge
of LOOP. Set the two phi args in NEW_PHI for these edges: */
loop_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, EDGE_SUCC (loop->latch, 0));
guard_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, loop_preheader_edge (loop));
add_phi_arg (new_phi, loop_arg, new_exit_e);
add_phi_arg (new_phi, guard_arg, guard_edge);
/* 1.3. Update phi in successor block. */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi, e) == loop_arg
|| PHI_ARG_DEF_FROM_EDGE (update_phi, e) == guard_arg);
SET_PHI_ARG_DEF (update_phi, e->dest_idx, PHI_RESULT (new_phi));
update_phi2 = new_phi;
/** 2. Handle loop-closed-ssa-form phis **/
if (!is_gimple_reg (PHI_RESULT (orig_phi)))
continue;
/* 2.1. Generate new phi node in NEW_EXIT_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
*new_exit_bb);
/* 2.2. NEW_EXIT_BB has one incoming edge: the exit-edge of the loop. */
add_phi_arg (new_phi, loop_arg, single_exit (loop));
/* 2.3. Update phi in successor of NEW_EXIT_BB: */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, new_exit_e) == loop_arg);
SET_PHI_ARG_DEF (update_phi2, new_exit_e->dest_idx, PHI_RESULT (new_phi));
/* 2.4. Record the newly created name with set_current_def.
We want to find a name such that
name = get_current_def (orig_loop_name)
and to set its current definition as follows:
set_current_def (name, new_phi_name)
If LOOP is a new loop then loop_arg is already the name we're
looking for. If LOOP is the original loop, then loop_arg is
the orig_loop_name and the relevant name is recorded in its
current reaching definition. */
if (is_new_loop)
current_new_name = loop_arg;
else
{
current_new_name = get_current_def (loop_arg);
/* current_def is not available only if the variable does not
change inside the loop, in which case we also don't care
about recording a current_def for it because we won't be
trying to create loop-exit-phis for it. */
if (!current_new_name)
continue;
}
gcc_assert (get_current_def (current_new_name) == NULL_TREE);
set_current_def (current_new_name, PHI_RESULT (new_phi));
bitmap_set_bit (*defs, SSA_NAME_VERSION (current_new_name));
}
}
/* Function slpeel_update_phi_nodes_for_guard2
Input:
- GUARD_EDGE, LOOP, IS_NEW_LOOP, NEW_EXIT_BB - as explained above.
In the context of the overall structure, we have:
loop1_preheader_bb:
guard1 (goto loop1/merge1_bb)
loop1
loop1_exit_bb:
guard2 (goto merge1_bb/merge2_bb)
merge1_bb
LOOP-> loop2
loop2_exit_bb
merge2_bb
next_bb
For each name used out side the loop (i.e - for each name that has an exit
phi in next_bb) we create a new phi in:
1. merge2_bb (to account for the edge from guard_bb)
2. loop2_exit_bb (an exit-phi to keep LOOP in loop-closed form)
3. guard2 bb (an exit phi to keep the preceding loop in loop-closed form),
if needed (if it wasn't handled by slpeel_update_phis_nodes_for_phi1).
*/
static void
slpeel_update_phi_nodes_for_guard2 (edge guard_edge, struct loop *loop,
bool is_new_loop, basic_block *new_exit_bb)
{
gimple orig_phi, new_phi;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge->dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e->dest;
edge new_exit_e;
tree orig_def, orig_def_new_name;
tree new_name, new_name2;
tree arg;
gimple_stmt_iterator gsi;
/* Create new bb between loop and new_merge_bb. */
*new_exit_bb = split_edge (single_exit (loop));
new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
for (gsi = gsi_start_phis (update_bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
update_phi = gsi_stmt (gsi);
orig_phi = update_phi;
orig_def = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
/* This loop-closed-phi actually doesn't represent a use
out of the loop - the phi arg is a constant. */
if (TREE_CODE (orig_def) != SSA_NAME)
continue;
orig_def_new_name = get_current_def (orig_def);
arg = NULL_TREE;
/** 1. Handle new-merge-point phis **/
/* 1.1. Generate new phi node in NEW_MERGE_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
new_merge_bb);
/* 1.2. NEW_MERGE_BB has two incoming edges: GUARD_EDGE and the exit-edge
of LOOP. Set the two PHI args in NEW_PHI for these edges: */
new_name = orig_def;
new_name2 = NULL_TREE;
if (orig_def_new_name)
{
new_name = orig_def_new_name;
/* Some variables have both loop-entry-phis and loop-exit-phis.
Such variables were given yet newer names by phis placed in
guard_bb by slpeel_update_phi_nodes_for_guard1. I.e:
new_name2 = get_current_def (get_current_def (orig_name)). */
new_name2 = get_current_def (new_name);
}
if (is_new_loop)
{
guard_arg = orig_def;
loop_arg = new_name;
}
else
{
guard_arg = new_name;
loop_arg = orig_def;
}
if (new_name2)
guard_arg = new_name2;
add_phi_arg (new_phi, loop_arg, new_exit_e);
add_phi_arg (new_phi, guard_arg, guard_edge);
/* 1.3. Update phi in successor block. */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi, e) == orig_def);
SET_PHI_ARG_DEF (update_phi, e->dest_idx, PHI_RESULT (new_phi));
update_phi2 = new_phi;
/** 2. Handle loop-closed-ssa-form phis **/
/* 2.1. Generate new phi node in NEW_EXIT_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
*new_exit_bb);
/* 2.2. NEW_EXIT_BB has one incoming edge: the exit-edge of the loop. */
add_phi_arg (new_phi, loop_arg, single_exit (loop));
/* 2.3. Update phi in successor of NEW_EXIT_BB: */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, new_exit_e) == loop_arg);
SET_PHI_ARG_DEF (update_phi2, new_exit_e->dest_idx, PHI_RESULT (new_phi));
/** 3. Handle loop-closed-ssa-form phis for first loop **/
/* 3.1. Find the relevant names that need an exit-phi in
GUARD_BB, i.e. names for which
slpeel_update_phi_nodes_for_guard1 had not already created a
phi node. This is the case for names that are used outside
the loop (and therefore need an exit phi) but are not updated
across loop iterations (and therefore don't have a
loop-header-phi).
slpeel_update_phi_nodes_for_guard1 is responsible for
creating loop-exit phis in GUARD_BB for names that have a
loop-header-phi. When such a phi is created we also record
the new name in its current definition. If this new name
exists, then guard_arg was set to this new name (see 1.2
above). Therefore, if guard_arg is not this new name, this
is an indication that an exit-phi in GUARD_BB was not yet
created, so we take care of it here. */
if (guard_arg == new_name2)
continue;
arg = guard_arg;
/* 3.2. Generate new phi node in GUARD_BB: */
new_phi = create_phi_node (SSA_NAME_VAR (PHI_RESULT (orig_phi)),
guard_edge->src);
/* 3.3. GUARD_BB has one incoming edge: */
gcc_assert (EDGE_COUNT (guard_edge->src->preds) == 1);
add_phi_arg (new_phi, arg, EDGE_PRED (guard_edge->src, 0));
/* 3.4. Update phi in successor of GUARD_BB: */
gcc_assert (PHI_ARG_DEF_FROM_EDGE (update_phi2, guard_edge)
== guard_arg);
SET_PHI_ARG_DEF (update_phi2, guard_edge->dest_idx, PHI_RESULT (new_phi));
}
}
/* Make the LOOP iterate NITERS times. This is done by adding a new IV
that starts at zero, increases by one and its limit is NITERS.
Assumption: the exit-condition of LOOP is the last stmt in the loop. */
void
slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
{
tree indx_before_incr, indx_after_incr;
gimple cond_stmt;
gimple orig_cond;
edge exit_edge = single_exit (loop);
gimple_stmt_iterator loop_cond_gsi;
gimple_stmt_iterator incr_gsi;
bool insert_after;
tree init = build_int_cst (TREE_TYPE (niters), 0);
tree step = build_int_cst (TREE_TYPE (niters), 1);
LOC loop_loc;
enum tree_code code;
orig_cond = get_loop_exit_condition (loop);
gcc_assert (orig_cond);
loop_cond_gsi = gsi_for_stmt (orig_cond);
standard_iv_increment_position (loop, &incr_gsi, &insert_after);
create_iv (init, step, NULL_TREE, loop,
&incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,
true, NULL_TREE, true,
GSI_SAME_STMT);
niters = force_gimple_operand_gsi (&loop_cond_gsi, niters, true, NULL_TREE,
true, GSI_SAME_STMT);
code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR;
cond_stmt = gimple_build_cond (code, indx_after_incr, niters, NULL_TREE,
NULL_TREE);
gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
/* Remove old loop exit test: */
gsi_remove (&loop_cond_gsi, true);
loop_loc = find_loop_location (loop);
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop_loc != UNKNOWN_LOC)
fprintf (dump_file, "\nloop at %s:%d: ",
LOC_FILE (loop_loc), LOC_LINE (loop_loc));
print_gimple_stmt (dump_file, cond_stmt, 0, TDF_SLIM);
}
loop->nb_iterations = niters;
}
/* Given LOOP this function generates a new copy of it and puts it
on E which is either the entry or exit of LOOP. */
struct loop *
slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, edge e)
{
struct loop *new_loop;
basic_block *new_bbs, *bbs;
bool at_exit;
bool was_imm_dom;
basic_block exit_dest;
gimple phi;
tree phi_arg;
edge exit, new_exit;
gimple_stmt_iterator gsi;
at_exit = (e == single_exit (loop));
if (!at_exit && e != loop_preheader_edge (loop))
return NULL;
bbs = get_loop_body (loop);
/* Check whether duplication is possible. */
if (!can_copy_bbs_p (bbs, loop->num_nodes))
{
free (bbs);
return NULL;
}
/* Generate new loop structure. */
new_loop = duplicate_loop (loop, loop_outer (loop));
if (!new_loop)
{
free (bbs);
return NULL;
}
exit_dest = single_exit (loop)->dest;
was_imm_dom = (get_immediate_dominator (CDI_DOMINATORS,
exit_dest) == loop->header ?
true : false);
new_bbs = XNEWVEC (basic_block, loop->num_nodes);
exit = single_exit (loop);
copy_bbs (bbs, loop->num_nodes, new_bbs,
&exit, 1, &new_exit, NULL,
e->src);
/* Duplicating phi args at exit bbs as coming
also from exit of duplicated loop. */
for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi); gsi_next (&gsi))
{
phi = gsi_stmt (gsi);
phi_arg = PHI_ARG_DEF_FROM_EDGE (phi, single_exit (loop));
if (phi_arg)
{
edge new_loop_exit_edge;
if (EDGE_SUCC (new_loop->header, 0)->dest == new_loop->latch)
new_loop_exit_edge = EDGE_SUCC (new_loop->header, 1);
else
new_loop_exit_edge = EDGE_SUCC (new_loop->header, 0);
add_phi_arg (phi, phi_arg, new_loop_exit_edge);
}
}
if (at_exit) /* Add the loop copy at exit. */
{
redirect_edge_and_branch_force (e, new_loop->header);
PENDING_STMT (e) = NULL;
set_immediate_dominator (CDI_DOMINATORS, new_loop->header, e->src);
if (was_imm_dom)
set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_loop->header);
}
else /* Add the copy at entry. */
{
edge new_exit_e;
edge entry_e = loop_preheader_edge (loop);
basic_block preheader = entry_e->src;
if (!flow_bb_inside_loop_p (new_loop,
EDGE_SUCC (new_loop->header, 0)->dest))
new_exit_e = EDGE_SUCC (new_loop->header, 0);
else
new_exit_e = EDGE_SUCC (new_loop->header, 1);
redirect_edge_and_branch_force (new_exit_e, loop->header);
PENDING_STMT (new_exit_e) = NULL;
set_immediate_dominator (CDI_DOMINATORS, loop->header,
new_exit_e->src);
/* We have to add phi args to the loop->header here as coming
from new_exit_e edge. */
for (gsi = gsi_start_phis (loop->header);
!gsi_end_p (gsi);
gsi_next (&gsi))
{
phi = gsi_stmt (gsi);
phi_arg = PHI_ARG_DEF_FROM_EDGE (phi, entry_e);
if (phi_arg)
add_phi_arg (phi, phi_arg, new_exit_e);
}
redirect_edge_and_branch_force (entry_e, new_loop->header);
PENDING_STMT (entry_e) = NULL;
set_immediate_dominator (CDI_DOMINATORS, new_loop->header, preheader);
}
free (new_bbs);
free (bbs);
return new_loop;
}
/* Given the condition statement COND, put it as the last statement
of GUARD_BB; EXIT_BB is the basic block to skip the loop;
Assumes that this is the single exit of the guarded loop.
Returns the skip edge. */
static edge
slpeel_add_loop_guard (basic_block guard_bb, tree cond, basic_block exit_bb,
basic_block dom_bb)
{
gimple_stmt_iterator gsi;
edge new_e, enter_e;
gimple cond_stmt;
gimple_seq gimplify_stmt_list = NULL;
enter_e = EDGE_SUCC (guard_bb, 0);
enter_e->flags &= ~EDGE_FALLTHRU;
enter_e->flags |= EDGE_FALSE_VALUE;
gsi = gsi_last_bb (guard_bb);
cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
cond_stmt = gimple_build_cond (NE_EXPR,
cond, build_int_cst (TREE_TYPE (cond), 0),
NULL_TREE, NULL_TREE);
if (gimplify_stmt_list)
gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
gsi = gsi_last_bb (guard_bb);
gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
/* Add new edge to connect guard block to the merge/loop-exit block. */
new_e = make_edge (guard_bb, exit_bb, EDGE_TRUE_VALUE);
set_immediate_dominator (CDI_DOMINATORS, exit_bb, dom_bb);
return new_e;
}
/* This function verifies that the following restrictions apply to LOOP:
(1) it is innermost
(2) it consists of exactly 2 basic blocks - header, and an empty latch.
(3) it is single entry, single exit
(4) its exit condition is the last stmt in the header
(5) E is the entry/exit edge of LOOP.
*/
bool
slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
{
edge exit_e = single_exit (loop);
edge entry_e = loop_preheader_edge (loop);
gimple orig_cond = get_loop_exit_condition (loop);
gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
if (need_ssa_update_p ())
return false;
if (loop->inner
/* All loops have an outer scope; the only case loop->outer is NULL is for
the function itself. */
|| !loop_outer (loop)
|| loop->num_nodes != 2
|| !empty_block_p (loop->latch)
|| !single_exit (loop)
/* Verify that new loop exit condition can be trivially modified. */
|| (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
|| (e != exit_e && e != entry_e))
return false;
return true;
}
#ifdef ENABLE_CHECKING
void
slpeel_verify_cfg_after_peeling (struct loop *first_loop,
struct loop *second_loop)
{
basic_block loop1_exit_bb = single_exit (first_loop)->dest;
basic_block loop2_entry_bb = loop_preheader_edge (second_loop)->src;
basic_block loop1_entry_bb = loop_preheader_edge (first_loop)->src;
/* A guard that controls whether the second_loop is to be executed or skipped
is placed in first_loop->exit. first_loop->exit therefore has two
successors - one is the preheader of second_loop, and the other is a bb
after second_loop.
*/
gcc_assert (EDGE_COUNT (loop1_exit_bb->succs) == 2);
/* 1. Verify that one of the successors of first_loop->exit is the preheader
of second_loop. */
/* The preheader of new_loop is expected to have two predecessors:
first_loop->exit and the block that precedes first_loop. */
gcc_assert (EDGE_COUNT (loop2_entry_bb->preds) == 2
&& ((EDGE_PRED (loop2_entry_bb, 0)->src == loop1_exit_bb
&& EDGE_PRED (loop2_entry_bb, 1)->src == loop1_entry_bb)
|| (EDGE_PRED (loop2_entry_bb, 1)->src == loop1_exit_bb
&& EDGE_PRED (loop2_entry_bb, 0)->src == loop1_entry_bb)));
/* Verify that the other successor of first_loop->exit is after the
second_loop. */
/* TODO */
}
#endif
/* If the run time cost model check determines that vectorization is
not profitable and hence scalar loop should be generated then set
FIRST_NITERS to prologue peeled iterations. This will allow all the
iterations to be executed in the prologue peeled scalar loop. */
void
set_prologue_iterations (basic_block bb_before_first_loop,
tree first_niters,
struct loop *loop,
unsigned int th)
{
edge e;
basic_block cond_bb, then_bb;
tree var, prologue_after_cost_adjust_name;
gimple_stmt_iterator gsi;
gimple newphi;
edge e_true, e_false, e_fallthru;
gimple cond_stmt;
gimple_seq gimplify_stmt_list = NULL, stmts = NULL;
tree cost_pre_condition = NULL_TREE;
tree scalar_loop_iters =
unshare_expr (LOOP_VINFO_NITERS_UNCHANGED (loop_vec_info_for_loop (loop)));
e = single_pred_edge (bb_before_first_loop);
cond_bb = split_edge(e);
e = single_pred_edge (bb_before_first_loop);
then_bb = split_edge(e);
set_immediate_dominator (CDI_DOMINATORS, then_bb, cond_bb);
e_false = make_single_succ_edge (cond_bb, bb_before_first_loop,
EDGE_FALSE_VALUE);
set_immediate_dominator (CDI_DOMINATORS, bb_before_first_loop, cond_bb);
e_true = EDGE_PRED (then_bb, 0);
e_true->flags &= ~EDGE_FALLTHRU;
e_true->flags |= EDGE_TRUE_VALUE;
e_fallthru = EDGE_SUCC (then_bb, 0);
cost_pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters), th));
cost_pre_condition =
force_gimple_operand (cost_pre_condition, &gimplify_stmt_list,
true, NULL_TREE);
cond_stmt = gimple_build_cond (NE_EXPR, cost_pre_condition,
build_int_cst (TREE_TYPE (cost_pre_condition),
0), NULL_TREE, NULL_TREE);
gsi = gsi_last_bb (cond_bb);
if (gimplify_stmt_list)
gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
gsi = gsi_last_bb (cond_bb);
gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
var = create_tmp_var (TREE_TYPE (scalar_loop_iters),
"prologue_after_cost_adjust");
add_referenced_var (var);
prologue_after_cost_adjust_name =
force_gimple_operand (scalar_loop_iters, &stmts, false, var);
gsi = gsi_last_bb (then_bb);
if (stmts)
gsi_insert_seq_after (&gsi, stmts, GSI_NEW_STMT);
newphi = create_phi_node (var, bb_before_first_loop);
add_phi_arg (newphi, prologue_after_cost_adjust_name, e_fallthru);
add_phi_arg (newphi, first_niters, e_false);
first_niters = PHI_RESULT (newphi);
}
/* Function slpeel_tree_peel_loop_to_edge.
Peel the first (last) iterations of LOOP into a new prolog (epilog) loop
that is placed on the entry (exit) edge E of LOOP. After this transformation
we have two loops one after the other - first-loop iterates FIRST_NITERS
times, and second-loop iterates the remainder NITERS - FIRST_NITERS times.
If the cost model indicates that it is profitable to emit a scalar
loop instead of the vector one, then the prolog (epilog) loop will iterate
for the entire unchanged scalar iterations of the loop.
Input:
- LOOP: the loop to be peeled.
- E: the exit or entry edge of LOOP.
If it is the entry edge, we peel the first iterations of LOOP. In this
case first-loop is LOOP, and second-loop is the newly created loop.
If it is the exit edge, we peel the last iterations of LOOP. In this
case, first-loop is the newly created loop, and second-loop is LOOP.
- NITERS: the number of iterations that LOOP iterates.
- FIRST_NITERS: the number of iterations that the first-loop should iterate.
- UPDATE_FIRST_LOOP_COUNT: specified whether this function is responsible
for updating the loop bound of the first-loop to FIRST_NITERS. If it
is false, the caller of this function may want to take care of this
(this can be useful if we don't want new stmts added to first-loop).
- TH: cost model profitability threshold of iterations for vectorization.
- CHECK_PROFITABILITY: specify whether cost model check has not occurred
during versioning and hence needs to occur during
prologue generation or whether cost model check
has not occurred during prologue generation and hence
needs to occur during epilogue generation.
Output:
The function returns a pointer to the new loop-copy, or NULL if it failed
to perform the transformation.
The function generates two if-then-else guards: one before the first loop,
and the other before the second loop:
The first guard is:
if (FIRST_NITERS == 0) then skip the first loop,
and go directly to the second loop.
The second guard is:
if (FIRST_NITERS == NITERS) then skip the second loop.
FORNOW only simple loops are supported (see slpeel_can_duplicate_loop_p).
FORNOW the resulting code will not be in loop-closed-ssa form.
*/
struct loop*
slpeel_tree_peel_loop_to_edge (struct loop *loop,
edge e, tree first_niters,
tree niters, bool update_first_loop_count,
unsigned int th, bool check_profitability)
{
struct loop *new_loop = NULL, *first_loop, *second_loop;
edge skip_e;
tree pre_condition = NULL_TREE;
bitmap definitions;
basic_block bb_before_second_loop, bb_after_second_loop;
basic_block bb_before_first_loop;
basic_block bb_between_loops;
basic_block new_exit_bb;
edge exit_e = single_exit (loop);
LOC loop_loc;
tree cost_pre_condition = NULL_TREE;
if (!slpeel_can_duplicate_loop_p (loop, e))
return NULL;
/* We have to initialize cfg_hooks. Then, when calling
cfg_hooks->split_edge, the function tree_split_edge
is actually called and, when calling cfg_hooks->duplicate_block,
the function tree_duplicate_bb is called. */
gimple_register_cfg_hooks ();
/* 1. Generate a copy of LOOP and put it on E (E is the entry/exit of LOOP).
Resulting CFG would be:
first_loop:
do {
} while ...
second_loop:
do {
} while ...
orig_exit_bb:
*/
if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e)))
{
loop_loc = find_loop_location (loop);
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop_loc != UNKNOWN_LOC)
fprintf (dump_file, "\n%s:%d: note: ",
LOC_FILE (loop_loc), LOC_LINE (loop_loc));
fprintf (dump_file, "tree_duplicate_loop_to_edge_cfg failed.\n");
}
return NULL;
}
if (e == exit_e)
{
/* NEW_LOOP was placed after LOOP. */
first_loop = loop;
second_loop = new_loop;
}
else
{
/* NEW_LOOP was placed before LOOP. */
first_loop = new_loop;
second_loop = loop;
}
definitions = ssa_names_to_replace ();
slpeel_update_phis_for_duplicate_loop (loop, new_loop, e == exit_e);
rename_variables_in_loop (new_loop);
/* 2. Add the guard code in one of the following ways:
2.a Add the guard that controls whether the first loop is executed.
This occurs when this function is invoked for prologue or epilogue
generation and when the cost model check can be done at compile time.
Resulting CFG would be:
bb_before_first_loop:
if (FIRST_NITERS == 0) GOTO bb_before_second_loop
GOTO first-loop
first_loop:
do {
} while ...
bb_before_second_loop:
second_loop:
do {
} while ...
orig_exit_bb:
2.b Add the cost model check that allows the prologue
to iterate for the entire unchanged scalar
iterations of the loop in the event that the cost
model indicates that the scalar loop is more
profitable than the vector one. This occurs when
this function is invoked for prologue generation
and the cost model check needs to be done at run
time.
Resulting CFG after prologue peeling would be:
if (scalar_loop_iterations <= th)
FIRST_NITERS = scalar_loop_iterations
bb_before_first_loop:
if (FIRST_NITERS == 0) GOTO bb_before_second_loop
GOTO first-loop
first_loop:
do {
} while ...
bb_before_second_loop:
second_loop:
do {
} while ...
orig_exit_bb:
2.c Add the cost model check that allows the epilogue
to iterate for the entire unchanged scalar
iterations of the loop in the event that the cost
model indicates that the scalar loop is more
profitable than the vector one. This occurs when
this function is invoked for epilogue generation
and the cost model check needs to be done at run
time.
Resulting CFG after prologue peeling would be:
bb_before_first_loop:
if ((scalar_loop_iterations <= th)
||
FIRST_NITERS == 0) GOTO bb_before_second_loop
GOTO first-loop
first_loop:
do {
} while ...
bb_before_second_loop:
second_loop:
do {
} while ...
orig_exit_bb:
*/
bb_before_first_loop = split_edge (loop_preheader_edge (first_loop));
bb_before_second_loop = split_edge (single_exit (first_loop));
/* Epilogue peeling. */
if (!update_first_loop_count)
{
pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, first_niters,
build_int_cst (TREE_TYPE (first_niters), 0));
if (check_profitability)
{
tree scalar_loop_iters
= unshare_expr (LOOP_VINFO_NITERS_UNCHANGED
(loop_vec_info_for_loop (loop)));
cost_pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters), th));
pre_condition = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
cost_pre_condition, pre_condition);
}
}
/* Prologue peeling. */
else
{
if (check_profitability)
set_prologue_iterations (bb_before_first_loop, first_niters,
loop, th);
pre_condition =
fold_build2 (LE_EXPR, boolean_type_node, first_niters,
build_int_cst (TREE_TYPE (first_niters), 0));
}
skip_e = slpeel_add_loop_guard (bb_before_first_loop, pre_condition,
bb_before_second_loop, bb_before_first_loop);
slpeel_update_phi_nodes_for_guard1 (skip_e, first_loop,
first_loop == new_loop,
&new_exit_bb, &definitions);
/* 3. Add the guard that controls whether the second loop is executed.
Resulting CFG would be:
bb_before_first_loop:
if (FIRST_NITERS == 0) GOTO bb_before_second_loop (skip first loop)
GOTO first-loop
first_loop:
do {
} while ...
bb_between_loops:
if (FIRST_NITERS == NITERS) GOTO bb_after_second_loop (skip second loop)
GOTO bb_before_second_loop
bb_before_second_loop:
second_loop:
do {
} while ...
bb_after_second_loop:
orig_exit_bb:
*/
bb_between_loops = new_exit_bb;
bb_after_second_loop = split_edge (single_exit (second_loop));
pre_condition =
fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition,
bb_after_second_loop, bb_before_first_loop);
slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
second_loop == new_loop, &new_exit_bb);
/* 4. Make first-loop iterate FIRST_NITERS times, if requested.
*/
if (update_first_loop_count)
slpeel_make_loop_iterate_ntimes (first_loop, first_niters);
BITMAP_FREE (definitions);
delete_update_ssa ();
return new_loop;
}
/* Function vect_get_loop_location.
Extract the location of the loop in the source code.
If the loop is not well formed for vectorization, an estimated
location is calculated.
Return the loop location if succeed and NULL if not. */
LOC
find_loop_location (struct loop *loop)
{
gimple stmt = NULL;
basic_block bb;
gimple_stmt_iterator si;
if (!loop)
return UNKNOWN_LOC;
stmt = get_loop_exit_condition (loop);
if (stmt && gimple_location (stmt) != UNKNOWN_LOC)
return gimple_location (stmt);
/* If we got here the loop is probably not "well formed",
try to estimate the loop location */
if (!loop->header)
return UNKNOWN_LOC;
bb = loop->header;
for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
{
stmt = gsi_stmt (si);
if (gimple_location (stmt) != UNKNOWN_LOC)
return gimple_location (stmt);
}
return UNKNOWN_LOC;
}
/*************************************************************************
Vectorization Debug Information.
*************************************************************************/
/* Function vect_set_verbosity_level.
Called from toplev.c upon detection of the
-ftree-vectorizer-verbose=N option. */
void
vect_set_verbosity_level (const char *val)
{
unsigned int vl;
vl = atoi (val);
if (vl < MAX_VERBOSITY_LEVEL)
vect_verbosity_level = vl;
else
vect_verbosity_level = MAX_VERBOSITY_LEVEL - 1;
}
/* Function vect_set_dump_settings.
Fix the verbosity level of the vectorizer if the
requested level was not set explicitly using the flag
-ftree-vectorizer-verbose=N.
Decide where to print the debugging information (dump_file/stderr).
If the user defined the verbosity level, but there is no dump file,
print to stderr, otherwise print to the dump file. */
static void
vect_set_dump_settings (void)
{
vect_dump = dump_file;
/* Check if the verbosity level was defined by the user: */
if (vect_verbosity_level != MAX_VERBOSITY_LEVEL)
{
/* If there is no dump file, print to stderr. */
if (!dump_file)
vect_dump = stderr;
return;
}
/* User didn't specify verbosity level: */
if (dump_file && (dump_flags & TDF_DETAILS))
vect_verbosity_level = REPORT_DETAILS;
else if (dump_file && (dump_flags & TDF_STATS))
vect_verbosity_level = REPORT_UNVECTORIZED_LOOPS;
else
vect_verbosity_level = REPORT_NONE;
gcc_assert (dump_file || vect_verbosity_level == REPORT_NONE);
}
/* Function debug_loop_details.
For vectorization debug dumps. */
bool
vect_print_dump_info (enum verbosity_levels vl)
{
if (vl > vect_verbosity_level)
return false;
if (!current_function_decl || !vect_dump)
return false;
if (vect_loop_location == UNKNOWN_LOC)
fprintf (vect_dump, "\n%s:%d: note: ",
DECL_SOURCE_FILE (current_function_decl),
DECL_SOURCE_LINE (current_function_decl));
else
fprintf (vect_dump, "\n%s:%d: note: ",
LOC_FILE (vect_loop_location), LOC_LINE (vect_loop_location));
return true;
}
/*************************************************************************
Vectorization Utilities.
*************************************************************************/
/* Function new_stmt_vec_info.
Create and initialize a new stmt_vec_info struct for STMT. */
stmt_vec_info
new_stmt_vec_info (gimple stmt, loop_vec_info loop_vinfo)
{
stmt_vec_info res;
res = (stmt_vec_info) xcalloc (1, sizeof (struct _stmt_vec_info));
STMT_VINFO_TYPE (res) = undef_vec_info_type;
STMT_VINFO_STMT (res) = stmt;
STMT_VINFO_LOOP_VINFO (res) = loop_vinfo;
STMT_VINFO_RELEVANT (res) = 0;
STMT_VINFO_LIVE_P (res) = false;
STMT_VINFO_VECTYPE (res) = NULL;
STMT_VINFO_VEC_STMT (res) = NULL;
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
STMT_VINFO_DR_BASE_ADDRESS (res) = NULL;
STMT_VINFO_DR_OFFSET (res) = NULL;
STMT_VINFO_DR_INIT (res) = NULL;
STMT_VINFO_DR_STEP (res) = NULL;
STMT_VINFO_DR_ALIGNED_TO (res) = NULL;
if (gimple_code (stmt) == GIMPLE_PHI
&& is_loop_header_bb_p (gimple_bb (stmt)))
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
STMT_SLP_TYPE (res) = 0;
DR_GROUP_FIRST_DR (res) = NULL;
DR_GROUP_NEXT_DR (res) = NULL;
DR_GROUP_SIZE (res) = 0;
DR_GROUP_STORE_COUNT (res) = 0;
DR_GROUP_GAP (res) = 0;
DR_GROUP_SAME_DR_STMT (res) = NULL;
DR_GROUP_READ_WRITE_DEPENDENCE (res) = false;
return res;
}
/* Create a hash table for stmt_vec_info. */
void
init_stmt_vec_info_vec (void)
{
gcc_assert (!stmt_vec_info_vec);
stmt_vec_info_vec = VEC_alloc (vec_void_p, heap, 50);
}
/* Free hash table for stmt_vec_info. */
void
free_stmt_vec_info_vec (void)
{
gcc_assert (stmt_vec_info_vec);
VEC_free (vec_void_p, heap, stmt_vec_info_vec);
}
/* Free stmt vectorization related info. */
void
free_stmt_vec_info (gimple stmt)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
if (!stmt_info)
return;
VEC_free (dr_p, heap, STMT_VINFO_SAME_ALIGN_REFS (stmt_info));
set_vinfo_for_stmt (stmt, NULL);
free (stmt_info);
}
/* Function bb_in_loop_p
Used as predicate for dfs order traversal of the loop bbs. */
static bool
bb_in_loop_p (const_basic_block bb, const void *data)
{
const struct loop *const loop = (const struct loop *)data;
if (flow_bb_inside_loop_p (loop, bb))
return true;
return false;
}
/* Function new_loop_vec_info.
Create and initialize a new loop_vec_info struct for LOOP, as well as
stmt_vec_info structs for all the stmts in LOOP. */
loop_vec_info
new_loop_vec_info (struct loop *loop)
{
loop_vec_info res;
basic_block *bbs;
gimple_stmt_iterator si;
unsigned int i, nbbs;
res = (loop_vec_info) xcalloc (1, sizeof (struct _loop_vec_info));
LOOP_VINFO_LOOP (res) = loop;
bbs = get_loop_body (loop);
/* Create/Update stmt_info for all stmts in the loop. */
for (i = 0; i < loop->num_nodes; i++)
{
basic_block bb = bbs[i];
/* BBs in a nested inner-loop will have been already processed (because
we will have called vect_analyze_loop_form for any nested inner-loop).
Therefore, for stmts in an inner-loop we just want to update the
STMT_VINFO_LOOP_VINFO field of their stmt_info to point to the new
loop_info of the outer-loop we are currently considering to vectorize
(instead of the loop_info of the inner-loop).
For stmts in other BBs we need to create a stmt_info from scratch. */
if (bb->loop_father != loop)
{
/* Inner-loop bb. */
gcc_assert (loop->inner && bb->loop_father == loop->inner);
for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si))
{
gimple phi = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (phi);
loop_vec_info inner_loop_vinfo =
STMT_VINFO_LOOP_VINFO (stmt_info);
gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
STMT_VINFO_LOOP_VINFO (stmt_info) = res;
}
for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
{
gimple stmt = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info inner_loop_vinfo =
STMT_VINFO_LOOP_VINFO (stmt_info);
gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
STMT_VINFO_LOOP_VINFO (stmt_info) = res;
}
}
else
{
/* bb in current nest. */
for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si))
{
gimple phi = gsi_stmt (si);
gimple_set_uid (phi, 0);
set_vinfo_for_stmt (phi, new_stmt_vec_info (phi, res));
}
for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
{
gimple stmt = gsi_stmt (si);
gimple_set_uid (stmt, 0);
set_vinfo_for_stmt (stmt, new_stmt_vec_info (stmt, res));
}
}
}
/* CHECKME: We want to visit all BBs before their successors (except for
latch blocks, for which this assertion wouldn't hold). In the simple
case of the loop forms we allow, a dfs order of the BBs would the same
as reversed postorder traversal, so we are safe. */
free (bbs);
bbs = XCNEWVEC (basic_block, loop->num_nodes);
nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
bbs, loop->num_nodes, loop);
gcc_assert (nbbs == loop->num_nodes);
LOOP_VINFO_BBS (res) = bbs;
LOOP_VINFO_NITERS (res) = NULL;
LOOP_VINFO_NITERS_UNCHANGED (res) = NULL;
LOOP_VINFO_COST_MODEL_MIN_ITERS (res) = 0;
LOOP_VINFO_VECTORIZABLE_P (res) = 0;
LOOP_PEELING_FOR_ALIGNMENT (res) = 0;
LOOP_VINFO_VECT_FACTOR (res) = 0;
LOOP_VINFO_DATAREFS (res) = VEC_alloc (data_reference_p, heap, 10);
LOOP_VINFO_DDRS (res) = VEC_alloc (ddr_p, heap, 10 * 10);
LOOP_VINFO_UNALIGNED_DR (res) = NULL;
LOOP_VINFO_MAY_MISALIGN_STMTS (res) =
VEC_alloc (gimple, heap,
PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS));
LOOP_VINFO_MAY_ALIAS_DDRS (res) =
VEC_alloc (ddr_p, heap,
PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10);
LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
return res;
}
/* Function destroy_loop_vec_info.
Free LOOP_VINFO struct, as well as all the stmt_vec_info structs of all the
stmts in the loop. */
void
destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts)
{
struct loop *loop;
basic_block *bbs;
int nbbs;
gimple_stmt_iterator si;
int j;
VEC (slp_instance, heap) *slp_instances;
slp_instance instance;
if (!loop_vinfo)
return;
loop = LOOP_VINFO_LOOP (loop_vinfo);
bbs = LOOP_VINFO_BBS (loop_vinfo);
nbbs = loop->num_nodes;
if (!clean_stmts)
{
free (LOOP_VINFO_BBS (loop_vinfo));
free_data_refs (LOOP_VINFO_DATAREFS (loop_vinfo));
free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
VEC_free (gimple, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
free (loop_vinfo);
loop->aux = NULL;
return;
}
for (j = 0; j < nbbs; j++)
{
basic_block bb = bbs[j];
for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si))
free_stmt_vec_info (gsi_stmt (si));
for (si = gsi_start_bb (bb); !gsi_end_p (si); )
{
gimple stmt = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
if (stmt_info)
{
/* Check if this is a "pattern stmt" (introduced by the
vectorizer during the pattern recognition pass). */
bool remove_stmt_p = false;
gimple orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
if (orig_stmt)
{
stmt_vec_info orig_stmt_info = vinfo_for_stmt (orig_stmt);
if (orig_stmt_info
&& STMT_VINFO_IN_PATTERN_P (orig_stmt_info))
remove_stmt_p = true;
}
/* Free stmt_vec_info. */
free_stmt_vec_info (stmt);
/* Remove dead "pattern stmts". */
if (remove_stmt_p)
gsi_remove (&si, true);
}
gsi_next (&si);
}
}
free (LOOP_VINFO_BBS (loop_vinfo));
free_data_refs (LOOP_VINFO_DATAREFS (loop_vinfo));
free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
VEC_free (gimple, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
VEC_free (ddr_p, heap, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
for (j = 0; VEC_iterate (slp_instance, slp_instances, j, instance); j++)
vect_free_slp_instance (instance);
VEC_free (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
VEC_free (gimple, heap, LOOP_VINFO_STRIDED_STORES (loop_vinfo));
free (loop_vinfo);
loop->aux = NULL;
}
/* Function vect_force_dr_alignment_p.
Returns whether the alignment of a DECL can be forced to be aligned
on ALIGNMENT bit boundary. */
bool
vect_can_force_dr_alignment_p (const_tree decl, unsigned int alignment)
{
if (TREE_CODE (decl) != VAR_DECL)
return false;
if (DECL_EXTERNAL (decl))
return false;
if (TREE_ASM_WRITTEN (decl))
return false;
if (TREE_STATIC (decl))
return (alignment <= MAX_OFILE_ALIGNMENT);
else
return (alignment <= MAX_STACK_ALIGNMENT);
}
/* Function get_vectype_for_scalar_type.
Returns the vector type corresponding to SCALAR_TYPE as supported
by the target. */
tree
get_vectype_for_scalar_type (tree scalar_type)
{
enum machine_mode inner_mode = TYPE_MODE (scalar_type);
int nbytes = GET_MODE_SIZE (inner_mode);
int nunits;
tree vectype;
if (nbytes == 0 || nbytes >= UNITS_PER_SIMD_WORD (inner_mode))
return NULL_TREE;
/* FORNOW: Only a single vector size per mode (UNITS_PER_SIMD_WORD)
is expected. */
nunits = UNITS_PER_SIMD_WORD (inner_mode) / nbytes;
vectype = build_vector_type (scalar_type, nunits);
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "get vectype with %d units of type ", nunits);
print_generic_expr (vect_dump, scalar_type, TDF_SLIM);
}
if (!vectype)
return NULL_TREE;
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "vectype: ");
print_generic_expr (vect_dump, vectype, TDF_SLIM);
}
if (!VECTOR_MODE_P (TYPE_MODE (vectype))
&& !INTEGRAL_MODE_P (TYPE_MODE (vectype)))
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "mode not supported by target.");
return NULL_TREE;
}
return vectype;
}
/* Function vect_supportable_dr_alignment
Return whether the data reference DR is supported with respect to its
alignment. */
enum dr_alignment_support
vect_supportable_dr_alignment (struct data_reference *dr)
{
gimple stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
enum machine_mode mode = (int) TYPE_MODE (vectype);
struct loop *vect_loop = LOOP_VINFO_LOOP (STMT_VINFO_LOOP_VINFO (stmt_info));
bool nested_in_vect_loop = nested_in_vect_loop_p (vect_loop, stmt);
bool invariant_in_outerloop = false;
if (aligned_access_p (dr))
return dr_aligned;
if (nested_in_vect_loop)
{
tree outerloop_step = STMT_VINFO_DR_STEP (stmt_info);
invariant_in_outerloop =
(tree_int_cst_compare (outerloop_step, size_zero_node) == 0);
}
/* Possibly unaligned access. */
/* We can choose between using the implicit realignment scheme (generating
a misaligned_move stmt) and the explicit realignment scheme (generating
aligned loads with a REALIGN_LOAD). There are two variants to the explicit
realignment scheme: optimized, and unoptimized.
We can optimize the realignment only if the step between consecutive
vector loads is equal to the vector size. Since the vector memory
accesses advance in steps of VS (Vector Size) in the vectorized loop, it
is guaranteed that the misalignment amount remains the same throughout the
execution of the vectorized loop. Therefore, we can create the
"realignment token" (the permutation mask that is passed to REALIGN_LOAD)
at the loop preheader.
However, in the case of outer-loop vectorization, when vectorizing a
memory access in the inner-loop nested within the LOOP that is now being
vectorized, while it is guaranteed that the misalignment of the
vectorized memory access will remain the same in different outer-loop
iterations, it is *not* guaranteed that is will remain the same throughout
the execution of the inner-loop. This is because the inner-loop advances
with the original scalar step (and not in steps of VS). If the inner-loop
step happens to be a multiple of VS, then the misalignment remains fixed
and we can use the optimized realignment scheme. For example:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
s += a[i+j];
When vectorizing the i-loop in the above example, the step between
consecutive vector loads is 1, and so the misalignment does not remain
fixed across the execution of the inner-loop, and the realignment cannot
be optimized (as illustrated in the following pseudo vectorized loop):
for (i=0; i<N; i+=4)
for (j=0; j<M; j++){
vs += vp[i+j]; // misalignment of &vp[i+j] is {0,1,2,3,0,1,2,3,...}
// when j is {0,1,2,3,4,5,6,7,...} respectively.
// (assuming that we start from an aligned address).
}
We therefore have to use the unoptimized realignment scheme:
for (i=0; i<N; i+=4)
for (j=k; j<M; j+=4)
vs += vp[i+j]; // misalignment of &vp[i+j] is always k (assuming
// that the misalignment of the initial address is
// 0).
The loop can then be vectorized as follows:
for (k=0; k<4; k++){
rt = get_realignment_token (&vp[k]);
for (i=0; i<N; i+=4){
v1 = vp[i+k];
for (j=k; j<M; j+=4){
v2 = vp[i+j+VS-1];
va = REALIGN_LOAD <v1,v2,rt>;
vs += va;
v1 = v2;
}
}
} */
if (DR_IS_READ (dr))
{
if (optab_handler (vec_realign_load_optab, mode)->insn_code !=
CODE_FOR_nothing
&& (!targetm.vectorize.builtin_mask_for_load
|| targetm.vectorize.builtin_mask_for_load ()))
{
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
if (nested_in_vect_loop
&& (TREE_INT_CST_LOW (DR_STEP (dr))
!= GET_MODE_SIZE (TYPE_MODE (vectype))))
return dr_explicit_realign;
else
return dr_explicit_realign_optimized;
}
if (optab_handler (movmisalign_optab, mode)->insn_code !=
CODE_FOR_nothing)
/* Can't software pipeline the loads, but can at least do them. */
return dr_unaligned_supported;
}
/* Unsupported. */
return dr_unaligned_unsupported;
}
/* Function vect_is_simple_use.
Input: This file contains drivers for the three vectorizers:
LOOP - the loop that is being vectorized. (1) loop vectorizer (inter-iteration parallelism),
OPERAND - operand of a stmt in LOOP. (2) loop-aware SLP (intra-iteration parallelism) (invoked by the loop
DEF - the defining stmt in case OPERAND is an SSA_NAME. vectorizer)
(3) BB vectorizer (out-of-loops), aka SLP
Returns whether a stmt with OPERAND can be vectorized.
Supportable operands are constants, loop invariants, and operands that are
defined by the current iteration of the loop. Unsupportable operands are
those that are defined by a previous iteration of the loop (as is the case
in reduction/induction computations). */
bool
vect_is_simple_use (tree operand, loop_vec_info loop_vinfo, gimple *def_stmt,
tree *def, enum vect_def_type *dt)
{
basic_block bb;
stmt_vec_info stmt_vinfo;
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
*def_stmt = NULL;
*def = NULL_TREE;
if (vect_print_dump_info (REPORT_DETAILS)) The rest of the vectorizer's code is organized as follows:
{ - tree-vect-loop.c - loop specific parts such as reductions, etc. These are
fprintf (vect_dump, "vect_is_simple_use: operand "); used by drivers (1) and (2).
print_generic_expr (vect_dump, operand, TDF_SLIM); - tree-vect-loop-manip.c - vectorizer's loop control-flow utilities, used by
} drivers (1) and (2).
- tree-vect-slp.c - BB vectorization specific analysis and transformation,
if (TREE_CODE (operand) == INTEGER_CST || TREE_CODE (operand) == REAL_CST) used by drivers (2) and (3).
{ - tree-vect-stmts.c - statements analysis and transformation (used by all).
*dt = vect_constant_def; - tree-vect-data-refs.c - vectorizer specific data-refs analysis and
return true; manipulations (used by all).
} - tree-vect-patterns.c - vectorizable code patterns detector (used by all)
if (is_gimple_min_invariant (operand))
{ Here's a poor attempt at illustrating that:
*def = operand;
*dt = vect_invariant_def; tree-vectorizer.c:
return true; loop_vect() loop_aware_slp() slp_vect()
} | / \ /
| / \ /
if (TREE_CODE (operand) == PAREN_EXPR) tree-vect-loop.c tree-vect-slp.c
{ | \ \ / / |
if (vect_print_dump_info (REPORT_DETAILS)) | \ \/ / |
fprintf (vect_dump, "non-associatable copy."); | \ /\ / |
operand = TREE_OPERAND (operand, 0); | \ / \ / |
} tree-vect-stmts.c tree-vect-data-refs.c
if (TREE_CODE (operand) != SSA_NAME) \ /
{ tree-vect-patterns.c
if (vect_print_dump_info (REPORT_DETAILS)) */
fprintf (vect_dump, "not ssa-name.");
return false;
}
*def_stmt = SSA_NAME_DEF_STMT (operand);
if (*def_stmt == NULL)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "no def_stmt.");
return false;
}
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "def_stmt: ");
print_gimple_stmt (vect_dump, *def_stmt, 0, TDF_SLIM);
}
/* empty stmt is expected only in case of a function argument.
(Otherwise - we expect a phi_node or a GIMPLE_ASSIGN). */
if (gimple_nop_p (*def_stmt))
{
*def = operand;
*dt = vect_invariant_def;
return true;
}
bb = gimple_bb (*def_stmt);
if (!flow_bb_inside_loop_p (loop, bb))
*dt = vect_invariant_def;
else
{
stmt_vinfo = vinfo_for_stmt (*def_stmt);
*dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
}
if (*dt == vect_unknown_def_type)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "Unsupported pattern.");
return false;
}
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "type of def: %d.",*dt);
switch (gimple_code (*def_stmt))
{
case GIMPLE_PHI:
*def = gimple_phi_result (*def_stmt);
break;
case GIMPLE_ASSIGN:
*def = gimple_assign_lhs (*def_stmt);
break;
case GIMPLE_CALL:
*def = gimple_call_lhs (*def_stmt);
if (*def != NULL)
break;
/* FALLTHRU */
default:
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unsupported defining stmt: ");
return false;
}
return true;
}
/* Function supportable_widening_operation
Check whether an operation represented by the code CODE is a
widening operation that is supported by the target platform in
vector form (i.e., when operating on arguments of type VECTYPE).
Widening operations we currently support are NOP (CONVERT), FLOAT
and WIDEN_MULT. This function checks if these operations are supported
by the target platform either directly (via vector tree-codes), or via
target builtins.
Output:
- CODE1 and CODE2 are codes of vector operations to be used when
vectorizing the operation, if available.
- DECL1 and DECL2 are decls of target builtin functions to be used
when vectorizing the operation, if available. In this case,
CODE1 and CODE2 are CALL_EXPR.
- MULTI_STEP_CVT determines the number of required intermediate steps in
case of multi-step conversion (like char->short->int - in that case
MULTI_STEP_CVT will be 1).
- INTERM_TYPES contains the intermediate type required to perform the
widening operation (short in the above example). */
bool
supportable_widening_operation (enum tree_code code, gimple stmt, tree vectype,
tree *decl1, tree *decl2,
enum tree_code *code1, enum tree_code *code2,
int *multi_step_cvt,
VEC (tree, heap) **interm_types)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
bool ordered_p;
enum machine_mode vec_mode;
enum insn_code icode1 = 0, icode2 = 0;
optab optab1, optab2;
tree type = gimple_expr_type (stmt);
tree wide_vectype = get_vectype_for_scalar_type (type);
enum tree_code c1, c2;
/* The result of a vectorized widening operation usually requires two vectors
(because the widened results do not fit int one vector). The generated
vector results would normally be expected to be generated in the same
order as in the original scalar computation, i.e. if 8 results are
generated in each vector iteration, they are to be organized as follows:
vect1: [res1,res2,res3,res4], vect2: [res5,res6,res7,res8].
However, in the special case that the result of the widening operation is
used in a reduction computation only, the order doesn't matter (because
when vectorizing a reduction we change the order of the computation).
Some targets can take advantage of this and generate more efficient code.
For example, targets like Altivec, that support widen_mult using a sequence
of {mult_even,mult_odd} generate the following vectors:
vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8].
When vectorizing outer-loops, we execute the inner-loop sequentially
(each vectorized inner-loop iteration contributes to VF outer-loop
iterations in parallel). We therefore don't allow to change the order
of the computation in the inner-loop during outer-loop vectorization. */
if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
&& !nested_in_vect_loop_p (vect_loop, stmt))
ordered_p = false;
else
ordered_p = true;
if (!ordered_p
&& code == WIDEN_MULT_EXPR
&& targetm.vectorize.builtin_mul_widen_even
&& targetm.vectorize.builtin_mul_widen_even (vectype)
&& targetm.vectorize.builtin_mul_widen_odd
&& targetm.vectorize.builtin_mul_widen_odd (vectype))
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "Unordered widening operation detected.");
*code1 = *code2 = CALL_EXPR;
*decl1 = targetm.vectorize.builtin_mul_widen_even (vectype);
*decl2 = targetm.vectorize.builtin_mul_widen_odd (vectype);
return true;
}
switch (code)
{
case WIDEN_MULT_EXPR:
if (BYTES_BIG_ENDIAN)
{
c1 = VEC_WIDEN_MULT_HI_EXPR;
c2 = VEC_WIDEN_MULT_LO_EXPR;
}
else
{
c2 = VEC_WIDEN_MULT_HI_EXPR;
c1 = VEC_WIDEN_MULT_LO_EXPR;
}
break;
CASE_CONVERT:
if (BYTES_BIG_ENDIAN)
{
c1 = VEC_UNPACK_HI_EXPR;
c2 = VEC_UNPACK_LO_EXPR;
}
else
{
c2 = VEC_UNPACK_HI_EXPR;
c1 = VEC_UNPACK_LO_EXPR;
}
break;
case FLOAT_EXPR:
if (BYTES_BIG_ENDIAN)
{
c1 = VEC_UNPACK_FLOAT_HI_EXPR;
c2 = VEC_UNPACK_FLOAT_LO_EXPR;
}
else
{
c2 = VEC_UNPACK_FLOAT_HI_EXPR;
c1 = VEC_UNPACK_FLOAT_LO_EXPR;
}
break;
case FIX_TRUNC_EXPR:
/* ??? Not yet implemented due to missing VEC_UNPACK_FIX_TRUNC_HI_EXPR/
VEC_UNPACK_FIX_TRUNC_LO_EXPR tree codes and optabs used for
computing the operation. */
return false;
default:
gcc_unreachable ();
}
if (code == FIX_TRUNC_EXPR)
{
/* The signedness is determined from output operand. */
optab1 = optab_for_tree_code (c1, type, optab_default);
optab2 = optab_for_tree_code (c2, type, optab_default);
}
else
{
optab1 = optab_for_tree_code (c1, vectype, optab_default);
optab2 = optab_for_tree_code (c2, vectype, optab_default);
}
if (!optab1 || !optab2) #include "config.h"
return false; #include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "ggc.h"
#include "tree.h"
#include "diagnostic.h"
#include "tree-flow.h"
#include "tree-dump.h"
#include "cfgloop.h"
#include "cfglayout.h"
#include "tree-vectorizer.h"
#include "tree-pass.h"
vec_mode = TYPE_MODE (vectype); /* vect_dump will be set to stderr or dump_file if exist. */
if ((icode1 = optab_handler (optab1, vec_mode)->insn_code) == CODE_FOR_nothing FILE *vect_dump;
|| (icode2 = optab_handler (optab2, vec_mode)->insn_code)
== CODE_FOR_nothing)
return false;
/* Check if it's a multi-step conversion that can be done using intermediate /* vect_verbosity_level set to an invalid value
types. */ to mark that it's uninitialized. */
if (insn_data[icode1].operand[0].mode != TYPE_MODE (wide_vectype) enum verbosity_levels vect_verbosity_level = MAX_VERBOSITY_LEVEL;
|| insn_data[icode2].operand[0].mode != TYPE_MODE (wide_vectype))
{
int i;
tree prev_type = vectype, intermediate_type;
enum machine_mode intermediate_mode, prev_mode = vec_mode;
optab optab3, optab4;
if (!CONVERT_EXPR_CODE_P (code))
return false;
*code1 = c1;
*code2 = c2;
/* We assume here that there will not be more than MAX_INTERM_CVT_STEPS
intermediate steps in promotion sequence. We try MAX_INTERM_CVT_STEPS
to get to NARROW_VECTYPE, and fail if we do not. */
*interm_types = VEC_alloc (tree, heap, MAX_INTERM_CVT_STEPS);
for (i = 0; i < 3; i++)
{
intermediate_mode = insn_data[icode1].operand[0].mode;
intermediate_type = lang_hooks.types.type_for_mode (intermediate_mode,
TYPE_UNSIGNED (prev_type));
optab3 = optab_for_tree_code (c1, intermediate_type, optab_default);
optab4 = optab_for_tree_code (c2, intermediate_type, optab_default);
if (!optab3 || !optab4
|| (icode1 = optab1->handlers[(int) prev_mode].insn_code)
== CODE_FOR_nothing
|| insn_data[icode1].operand[0].mode != intermediate_mode
|| (icode2 = optab2->handlers[(int) prev_mode].insn_code)
== CODE_FOR_nothing
|| insn_data[icode2].operand[0].mode != intermediate_mode
|| (icode1 = optab3->handlers[(int) intermediate_mode].insn_code)
== CODE_FOR_nothing
|| (icode2 = optab4->handlers[(int) intermediate_mode].insn_code)
== CODE_FOR_nothing)
return false;
VEC_quick_push (tree, *interm_types, intermediate_type);
(*multi_step_cvt)++;
if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
&& insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
return true;
prev_type = intermediate_type;
prev_mode = intermediate_mode;
}
return false; /* Loop location. */
} LOC vect_loop_location;
*code1 = c1; /* Bitmap of virtual variables to be renamed. */
*code2 = c2; bitmap vect_memsyms_to_rename;
return true;
}
/* Vector mapping GIMPLE stmt to stmt_vec_info. */
VEC(vec_void_p,heap) *stmt_vec_info_vec;
/* Function supportable_narrowing_operation
Check whether an operation represented by the code CODE is a /* Function vect_set_verbosity_level.
narrowing operation that is supported by the target platform in
vector form (i.e., when operating on arguments of type VECTYPE).
Narrowing operations we currently support are NOP (CONVERT) and
FIX_TRUNC. This function checks if these operations are supported by
the target platform directly via vector tree-codes.
Output: Called from toplev.c upon detection of the
- CODE1 is the code of a vector operation to be used when -ftree-vectorizer-verbose=N option. */
vectorizing the operation, if available.
- MULTI_STEP_CVT determines the number of required intermediate steps in
case of multi-step conversion (like int->short->char - in that case
MULTI_STEP_CVT will be 1).
- INTERM_TYPES contains the intermediate type required to perform the
narrowing operation (short in the above example). */
bool void
supportable_narrowing_operation (enum tree_code code, vect_set_verbosity_level (const char *val)
const_gimple stmt, tree vectype,
enum tree_code *code1, int *multi_step_cvt,
VEC (tree, heap) **interm_types)
{ {
enum machine_mode vec_mode; unsigned int vl;
enum insn_code icode1;
optab optab1, interm_optab;
tree type = gimple_expr_type (stmt);
tree narrow_vectype = get_vectype_for_scalar_type (type);
enum tree_code c1;
tree intermediate_type, prev_type;
int i;
switch (code)
{
CASE_CONVERT:
c1 = VEC_PACK_TRUNC_EXPR;
break;
case FIX_TRUNC_EXPR:
c1 = VEC_PACK_FIX_TRUNC_EXPR;
break;
case FLOAT_EXPR:
/* ??? Not yet implemented due to missing VEC_PACK_FLOAT_EXPR
tree code and optabs used for computing the operation. */
return false;
default:
gcc_unreachable ();
}
if (code == FIX_TRUNC_EXPR)
/* The signedness is determined from output operand. */
optab1 = optab_for_tree_code (c1, type, optab_default);
else
optab1 = optab_for_tree_code (c1, vectype, optab_default);
if (!optab1)
return false;
vec_mode = TYPE_MODE (vectype);
if ((icode1 = optab_handler (optab1, vec_mode)->insn_code)
== CODE_FOR_nothing)
return false;
/* Check if it's a multi-step conversion that can be done using intermediate
types. */
if (insn_data[icode1].operand[0].mode != TYPE_MODE (narrow_vectype))
{
enum machine_mode intermediate_mode, prev_mode = vec_mode;
*code1 = c1;
prev_type = vectype;
/* We assume here that there will not be more than MAX_INTERM_CVT_STEPS
intermediate steps in promotion sequence. We try MAX_INTERM_CVT_STEPS
to get to NARROW_VECTYPE, and fail if we do not. */
*interm_types = VEC_alloc (tree, heap, MAX_INTERM_CVT_STEPS);
for (i = 0; i < 3; i++)
{
intermediate_mode = insn_data[icode1].operand[0].mode;
intermediate_type = lang_hooks.types.type_for_mode (intermediate_mode,
TYPE_UNSIGNED (prev_type));
interm_optab = optab_for_tree_code (c1, intermediate_type,
optab_default);
if (!interm_optab
|| (icode1 = optab1->handlers[(int) prev_mode].insn_code)
== CODE_FOR_nothing
|| insn_data[icode1].operand[0].mode != intermediate_mode
|| (icode1
= interm_optab->handlers[(int) intermediate_mode].insn_code)
== CODE_FOR_nothing)
return false;
VEC_quick_push (tree, *interm_types, intermediate_type);
(*multi_step_cvt)++;
if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
return true;
prev_type = intermediate_type;
prev_mode = intermediate_mode;
}
return false;
}
*code1 = c1; vl = atoi (val);
return true; if (vl < MAX_VERBOSITY_LEVEL)
vect_verbosity_level = vl;
else
vect_verbosity_level = MAX_VERBOSITY_LEVEL - 1;
} }
/* Function reduction_code_for_scalar_code /* Function vect_set_dump_settings.
Input:
CODE - tree_code of a reduction operations.
Output:
REDUC_CODE - the corresponding tree-code to be used to reduce the
vector of partial results into a single scalar result (which
will also reside in a vector).
Return TRUE if a corresponding REDUC_CODE was found, FALSE otherwise. */
bool
reduction_code_for_scalar_code (enum tree_code code,
enum tree_code *reduc_code)
{
switch (code)
{
case MAX_EXPR:
*reduc_code = REDUC_MAX_EXPR;
return true;
case MIN_EXPR:
*reduc_code = REDUC_MIN_EXPR;
return true;
case PLUS_EXPR:
*reduc_code = REDUC_PLUS_EXPR;
return true;
default:
return false;
}
}
/* Error reporting helper for vect_is_simple_reduction below. GIMPLE statement Fix the verbosity level of the vectorizer if the
STMT is printed with a message MSG. */ requested level was not set explicitly using the flag
-ftree-vectorizer-verbose=N.
Decide where to print the debugging information (dump_file/stderr).
If the user defined the verbosity level, but there is no dump file,
print to stderr, otherwise print to the dump file. */
static void static void
report_vect_op (gimple stmt, const char *msg) vect_set_dump_settings (void)
{
fprintf (vect_dump, "%s", msg);
print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
}
/* Function vect_is_simple_reduction
Detect a cross-iteration def-use cycle that represents a simple
reduction computation. We look for the following pattern:
loop_header:
a1 = phi < a0, a2 >
a3 = ...
a2 = operation (a3, a1)
such that:
1. operation is commutative and associative and it is safe to
change the order of the computation.
2. no uses for a2 in the loop (a2 is used out of the loop)
3. no uses of a1 in the loop besides the reduction operation.
Condition 1 is tested here.
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
gimple
vect_is_simple_reduction (loop_vec_info loop_info, gimple phi)
{ {
struct loop *loop = (gimple_bb (phi))->loop_father; vect_dump = dump_file;
struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
gimple def_stmt, def1, def2;
enum tree_code code;
tree op1, op2;
tree type;
int nloop_uses;
tree name;
imm_use_iterator imm_iter;
use_operand_p use_p;
gcc_assert (loop == vect_loop || flow_loop_nested_p (vect_loop, loop));
name = PHI_RESULT (phi);
nloop_uses = 0;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
{
gimple use_stmt = USE_STMT (use_p);
if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
&& vinfo_for_stmt (use_stmt)
&& !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
nloop_uses++;
if (nloop_uses > 1)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "reduction used in loop.");
return NULL;
}
}
if (TREE_CODE (loop_arg) != SSA_NAME)
{
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "reduction: not ssa_name: ");
print_generic_expr (vect_dump, loop_arg, TDF_SLIM);
}
return NULL;
}
def_stmt = SSA_NAME_DEF_STMT (loop_arg);
if (!def_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "reduction: no def_stmt.");
return NULL;
}
if (!is_gimple_assign (def_stmt))
{
if (vect_print_dump_info (REPORT_DETAILS))
print_gimple_stmt (vect_dump, def_stmt, 0, TDF_SLIM);
return NULL;
}
name = gimple_assign_lhs (def_stmt);
nloop_uses = 0;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
{
gimple use_stmt = USE_STMT (use_p);
if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
&& vinfo_for_stmt (use_stmt)
&& !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
nloop_uses++;
if (nloop_uses > 1)
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "reduction used in loop.");
return NULL;
}
}
code = gimple_assign_rhs_code (def_stmt);
if (!commutative_tree_code (code) || !associative_tree_code (code))
{
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt, "reduction: not commutative/associative: ");
return NULL;
}
if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS)
{
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt, "reduction: not binary operation: ");
return NULL;
}
op1 = gimple_assign_rhs1 (def_stmt);
op2 = gimple_assign_rhs2 (def_stmt);
if (TREE_CODE (op1) != SSA_NAME || TREE_CODE (op2) != SSA_NAME)
{
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt, "reduction: uses not ssa_names: ");
return NULL;
}
/* Check that it's ok to change the order of the computation. */
type = TREE_TYPE (gimple_assign_lhs (def_stmt));
if (TYPE_MAIN_VARIANT (type) != TYPE_MAIN_VARIANT (TREE_TYPE (op1))
|| TYPE_MAIN_VARIANT (type) != TYPE_MAIN_VARIANT (TREE_TYPE (op2)))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "reduction: multiple types: operation type: ");
print_generic_expr (vect_dump, type, TDF_SLIM);
fprintf (vect_dump, ", operands types: ");
print_generic_expr (vect_dump, TREE_TYPE (op1), TDF_SLIM);
fprintf (vect_dump, ",");
print_generic_expr (vect_dump, TREE_TYPE (op2), TDF_SLIM);
}
return NULL;
}
/* Generally, when vectorizing a reduction we change the order of the
computation. This may change the behavior of the program in some
cases, so we need to check that this is ok. One exception is when
vectorizing an outer-loop: the inner-loop is executed sequentially,
and therefore vectorizing reductions in the inner-loop during
outer-loop vectorization is safe. */
/* CHECKME: check for !flag_finite_math_only too? */
if (SCALAR_FLOAT_TYPE_P (type) && !flag_associative_math
&& !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt, "reduction: unsafe fp math optimization: ");
return NULL;
}
else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
&& !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt, "reduction: unsafe int math optimization: ");
return NULL;
}
else if (SAT_FIXED_POINT_TYPE_P (type))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt,
"reduction: unsafe fixed-point math optimization: ");
return NULL;
}
/* reduction is safe. we're dealing with one of the following: /* Check if the verbosity level was defined by the user: */
1) integer arithmetic and no trapv if (vect_verbosity_level != MAX_VERBOSITY_LEVEL)
2) floating point arithmetic, and special flags permit this optimization.
*/
def1 = SSA_NAME_DEF_STMT (op1);
def2 = SSA_NAME_DEF_STMT (op2);
if (!def1 || !def2 || gimple_nop_p (def1) || gimple_nop_p (def2))
{ {
if (vect_print_dump_info (REPORT_DETAILS)) /* If there is no dump file, print to stderr. */
report_vect_op (def_stmt, "reduction: no defs for operands: "); if (!dump_file)
return NULL; vect_dump = stderr;
return;
} }
/* User didn't specify verbosity level: */
/* Check that one def is the reduction def, defined by PHI, if (dump_file && (dump_flags & TDF_DETAILS))
the other def is either defined in the loop ("vect_loop_def"), vect_verbosity_level = REPORT_DETAILS;
or it's an induction (defined by a loop-header phi-node). */ else if (dump_file && (dump_flags & TDF_STATS))
vect_verbosity_level = REPORT_UNVECTORIZED_LOOPS;
if (def2 == phi
&& flow_bb_inside_loop_p (loop, gimple_bb (def1))
&& (is_gimple_assign (def1)
|| STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_induction_def
|| (gimple_code (def1) == GIMPLE_PHI
&& STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_loop_def
&& !is_loop_header_bb_p (gimple_bb (def1)))))
{
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt, "detected reduction:");
return def_stmt;
}
else if (def1 == phi
&& flow_bb_inside_loop_p (loop, gimple_bb (def2))
&& (is_gimple_assign (def2)
|| STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_induction_def
|| (gimple_code (def2) == GIMPLE_PHI
&& STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_loop_def
&& !is_loop_header_bb_p (gimple_bb (def2)))))
{
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
argument). */
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt ,
"detected reduction: need to swap operands:");
swap_tree_operands (def_stmt, gimple_assign_rhs1_ptr (def_stmt),
gimple_assign_rhs2_ptr (def_stmt));
return def_stmt;
}
else else
{ vect_verbosity_level = REPORT_NONE;
if (vect_print_dump_info (REPORT_DETAILS))
report_vect_op (def_stmt, "reduction: unknown pattern."); gcc_assert (dump_file || vect_verbosity_level == REPORT_NONE);
return NULL;
}
} }
/* Function vect_is_simple_iv_evolution. /* Function debug_loop_details.
FORNOW: A simple evolution of an induction variables in the loop is For vectorization debug dumps. */
considered a polynomial evolution with constant step. */
bool bool
vect_is_simple_iv_evolution (unsigned loop_nb, tree access_fn, tree * init, vect_print_dump_info (enum verbosity_levels vl)
tree * step)
{ {
tree init_expr; if (vl > vect_verbosity_level)
tree step_expr;
tree evolution_part = evolution_part_in_loop_num (access_fn, loop_nb);
/* When there is no evolution in this loop, the evolution function
is not "simple". */
if (evolution_part == NULL_TREE)
return false;
/* When the evolution is a polynomial of degree >= 2
the evolution function is not "simple". */
if (tree_is_chrec (evolution_part))
return false; return false;
step_expr = evolution_part;
init_expr = unshare_expr (initial_condition_in_loop_num (access_fn, loop_nb));
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "step: ");
print_generic_expr (vect_dump, step_expr, TDF_SLIM);
fprintf (vect_dump, ", init: ");
print_generic_expr (vect_dump, init_expr, TDF_SLIM);
}
*init = init_expr; if (!current_function_decl || !vect_dump)
*step = step_expr; return false;
if (TREE_CODE (step_expr) != INTEGER_CST) if (vect_loop_location == UNKNOWN_LOC)
{ fprintf (vect_dump, "\n%s:%d: note: ",
if (vect_print_dump_info (REPORT_DETAILS)) DECL_SOURCE_FILE (current_function_decl),
fprintf (vect_dump, "step unknown."); DECL_SOURCE_LINE (current_function_decl));
return false; else
} fprintf (vect_dump, "\n%s:%d: note: ",
LOC_FILE (vect_loop_location), LOC_LINE (vect_loop_location));
return true; return true;
} }
...@@ -2849,6 +242,7 @@ vectorize_loops (void) ...@@ -2849,6 +242,7 @@ vectorize_loops (void)
return num_vectorized_loops > 0 ? TODO_cleanup_cfg : 0; return num_vectorized_loops > 0 ? TODO_cleanup_cfg : 0;
} }
/* Increase alignment of global arrays to improve vectorization potential. /* Increase alignment of global arrays to improve vectorization potential.
TODO: TODO:
...@@ -2871,49 +265,53 @@ increase_alignment (void) ...@@ -2871,49 +265,53 @@ increase_alignment (void)
unsigned int alignment; unsigned int alignment;
if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE) if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
continue; continue;
vectype = get_vectype_for_scalar_type (TREE_TYPE (TREE_TYPE (decl))); vectype = get_vectype_for_scalar_type (TREE_TYPE (TREE_TYPE (decl)));
if (!vectype) if (!vectype)
continue; continue;
alignment = TYPE_ALIGN (vectype); alignment = TYPE_ALIGN (vectype);
if (DECL_ALIGN (decl) >= alignment) if (DECL_ALIGN (decl) >= alignment)
continue; continue;
if (vect_can_force_dr_alignment_p (decl, alignment)) if (vect_can_force_dr_alignment_p (decl, alignment))
{ {
DECL_ALIGN (decl) = TYPE_ALIGN (vectype); DECL_ALIGN (decl) = TYPE_ALIGN (vectype);
DECL_USER_ALIGN (decl) = 1; DECL_USER_ALIGN (decl) = 1;
if (dump_file) if (dump_file)
{ {
fprintf (dump_file, "Increasing alignment of decl: "); fprintf (dump_file, "Increasing alignment of decl: ");
print_generic_expr (dump_file, decl, TDF_SLIM); print_generic_expr (dump_file, decl, TDF_SLIM);
} }
} }
} }
return 0; return 0;
} }
static bool static bool
gate_increase_alignment (void) gate_increase_alignment (void)
{ {
return flag_section_anchors && flag_tree_vectorize; return flag_section_anchors && flag_tree_vectorize;
} }
struct simple_ipa_opt_pass pass_ipa_increase_alignment =
struct simple_ipa_opt_pass pass_ipa_increase_alignment =
{ {
{ {
SIMPLE_IPA_PASS, SIMPLE_IPA_PASS,
"increase_alignment", /* name */ "increase_alignment", /* name */
gate_increase_alignment, /* gate */ gate_increase_alignment, /* gate */
increase_alignment, /* execute */ increase_alignment, /* execute */
NULL, /* sub */ NULL, /* sub */
NULL, /* next */ NULL, /* next */
0, /* static_pass_number */ 0, /* static_pass_number */
0, /* tv_id */ 0, /* tv_id */
0, /* properties_required */ 0, /* properties_required */
0, /* properties_provided */ 0, /* properties_provided */
0, /* properties_destroyed */ 0, /* properties_destroyed */
0, /* todo_flags_start */ 0, /* todo_flags_start */
0 /* todo_flags_finish */ 0 /* todo_flags_finish */
} }
}; };
/* Loop Vectorization /* Vectorizer
Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc. Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free
Software Foundation, Inc.
Contributed by Dorit Naishlos <dorit@il.ibm.com> Contributed by Dorit Naishlos <dorit@il.ibm.com>
This file is part of GCC. This file is part of GCC.
...@@ -21,6 +22,8 @@ along with GCC; see the file COPYING3. If not see ...@@ -21,6 +22,8 @@ along with GCC; see the file COPYING3. If not see
#ifndef GCC_TREE_VECTORIZER_H #ifndef GCC_TREE_VECTORIZER_H
#define GCC_TREE_VECTORIZER_H #define GCC_TREE_VECTORIZER_H
#include "tree-data-ref.h"
typedef source_location LOC; typedef source_location LOC;
#define UNKNOWN_LOC UNKNOWN_LOCATION #define UNKNOWN_LOC UNKNOWN_LOCATION
#define EXPR_LOC(e) EXPR_LOCATION(e) #define EXPR_LOC(e) EXPR_LOCATION(e)
...@@ -687,72 +690,124 @@ known_alignment_for_access_p (struct data_reference *data_ref_info) ...@@ -687,72 +690,124 @@ known_alignment_for_access_p (struct data_reference *data_ref_info)
/* vect_dump will be set to stderr or dump_file if exist. */ /* vect_dump will be set to stderr or dump_file if exist. */
extern FILE *vect_dump; extern FILE *vect_dump;
extern LOC vect_loop_location;
extern enum verbosity_levels vect_verbosity_level; extern enum verbosity_levels vect_verbosity_level;
/* Bitmap of virtual variables to be renamed. */ /* Bitmap of virtual variables to be renamed. */
extern bitmap vect_memsyms_to_rename; extern bitmap vect_memsyms_to_rename;
/*-----------------------------------------------------------------*/ /*-----------------------------------------------------------------*/
/* Function prototypes. */ /* Function prototypes. */
/*-----------------------------------------------------------------*/ /*-----------------------------------------------------------------*/
/************************************************************************* /* Simple loop peeling and versioning utilities for vectorizer's purposes -
Simple Loop Peeling Utilities - in tree-vectorizer.c in tree-vect-loop-manip.c. */
*************************************************************************/
/* Entry point for peeling of simple loops.
Peel the first/last iterations of a loop.
It can be used outside of the vectorizer for loops that are simple enough
(see function documentation). In the vectorizer it is used to peel the
last few iterations when the loop bound is unknown or does not evenly
divide by the vectorization factor, and to peel the first few iterations
to force the alignment of data references in the loop. */
extern struct loop *slpeel_tree_peel_loop_to_edge
(struct loop *, edge, tree, tree, bool, unsigned int, bool);
extern void set_prologue_iterations (basic_block, tree,
struct loop *, unsigned int);
struct loop *tree_duplicate_loop_on_edge (struct loop *, edge);
extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree);
extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
#ifdef ENABLE_CHECKING extern void vect_loop_versioning (loop_vec_info);
extern void slpeel_verify_cfg_after_peeling (struct loop *, struct loop *); extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *);
#endif extern void vect_do_peeling_for_alignment (loop_vec_info);
extern LOC find_loop_location (struct loop *);
extern bool vect_can_advance_ivs_p (loop_vec_info);
/************************************************************************* /* In tree-vect-stmts.c. */
General Vectorization Utilities
*************************************************************************/
/** In tree-vectorizer.c **/
extern tree get_vectype_for_scalar_type (tree); extern tree get_vectype_for_scalar_type (tree);
extern bool vect_is_simple_use (tree, loop_vec_info, gimple *, tree *, extern bool vect_is_simple_use (tree, loop_vec_info, gimple *, tree *,
enum vect_def_type *); enum vect_def_type *);
extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
extern gimple vect_is_simple_reduction (loop_vec_info, gimple);
extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
extern bool reduction_code_for_scalar_code (enum tree_code, enum tree_code *);
extern bool supportable_widening_operation (enum tree_code, gimple, tree, extern bool supportable_widening_operation (enum tree_code, gimple, tree,
tree *, tree *, enum tree_code *, enum tree_code *, tree *, tree *, enum tree_code *,
int *, VEC (tree, heap) **); enum tree_code *, int *,
VEC (tree, heap) **);
extern bool supportable_narrowing_operation (enum tree_code, const_gimple, extern bool supportable_narrowing_operation (enum tree_code, const_gimple,
tree, enum tree_code *, int *, VEC (tree, heap) **); tree, enum tree_code *, int *,
VEC (tree, heap) **);
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
extern void destroy_loop_vec_info (loop_vec_info, bool);
extern stmt_vec_info new_stmt_vec_info (gimple stmt, loop_vec_info); extern stmt_vec_info new_stmt_vec_info (gimple stmt, loop_vec_info);
extern void free_stmt_vec_info (gimple stmt); extern void free_stmt_vec_info (gimple stmt);
extern tree vectorizable_function (gimple, tree, tree);
extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
/** In tree-vect-analyze.c **/ slp_tree);
/* Driver for analysis stage. */ extern void vect_model_store_cost (stmt_vec_info, int, enum vect_def_type,
slp_tree);
extern void vect_model_load_cost (stmt_vec_info, int, slp_tree);
extern void vect_finish_stmt_generation (gimple, gimple,
gimple_stmt_iterator *);
extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
extern int cost_for_stmt (gimple);
extern tree vect_get_vec_def_for_operand (tree, gimple, tree *);
extern tree vect_init_vector (gimple, tree, tree,
gimple_stmt_iterator *);
extern tree vect_get_vec_def_for_stmt_copy (enum vect_def_type, tree);
extern bool vect_transform_stmt (gimple, gimple_stmt_iterator *,
bool *, slp_tree, slp_instance);
extern void vect_remove_stores (gimple);
extern bool vect_analyze_operations (loop_vec_info);
/* In tree-vect-data-refs.c. */
extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
extern tree vect_get_smallest_scalar_type (gimple, HOST_WIDE_INT *,
HOST_WIDE_INT *);
extern bool vect_analyze_data_ref_dependences (loop_vec_info);
extern bool vect_enhance_data_refs_alignment (loop_vec_info);
extern bool vect_analyze_data_refs_alignment (loop_vec_info);
extern bool vect_analyze_data_ref_accesses (loop_vec_info);
extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
extern bool vect_analyze_data_refs (loop_vec_info);
extern tree vect_create_data_ref_ptr (gimple, struct loop *, tree, tree *,
gimple *, bool, bool *, tree);
extern tree bump_vector_ptr (tree, gimple, gimple_stmt_iterator *, gimple, tree);
extern tree vect_create_destination_var (tree, tree);
extern bool vect_strided_store_supported (tree);
extern bool vect_strided_load_supported (tree);
extern bool vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gimple,
gimple_stmt_iterator *, VEC(tree,heap) **);
extern tree vect_setup_realignment (gimple, gimple_stmt_iterator *, tree *,
enum dr_alignment_support, tree,
struct loop **);
extern bool vect_permute_load_chain (VEC(tree,heap) *,unsigned int, gimple,
gimple_stmt_iterator *, VEC(tree,heap) **);
extern bool vect_transform_strided_load (gimple, VEC(tree,heap) *, int,
gimple_stmt_iterator *);
extern int vect_get_place_in_interleaving_chain (gimple, gimple);
extern tree vect_get_new_vect_var (tree, enum vect_var_kind, const char *);
extern tree vect_create_addr_base_for_vector_ref (gimple, gimple_seq *,
tree, struct loop *);
/* In tree-vect-loop.c. */
/* FORNOW: Used in tree-parloops.c. */
extern void destroy_loop_vec_info (loop_vec_info, bool);
extern gimple vect_is_simple_reduction (loop_vec_info, gimple);
/* Drive for loop analysis stage. */
extern loop_vec_info vect_analyze_loop (struct loop *); extern loop_vec_info vect_analyze_loop (struct loop *);
extern void vect_free_slp_instance (slp_instance); /* Drive for loop transformation stage. */
extern void vect_transform_loop (loop_vec_info);
extern loop_vec_info vect_analyze_loop_form (struct loop *); extern loop_vec_info vect_analyze_loop_form (struct loop *);
extern tree vect_get_smallest_scalar_type (gimple, HOST_WIDE_INT *, extern bool vectorizable_live_operation (gimple, gimple_stmt_iterator *,
HOST_WIDE_INT *); gimple *);
extern bool vectorizable_reduction (gimple, gimple_stmt_iterator *, gimple *);
extern bool vectorizable_induction (gimple, gimple_stmt_iterator *, gimple *);
extern int vect_estimate_min_profitable_iters (loop_vec_info);
extern tree get_initial_def_for_reduction (gimple, tree, tree *);
extern int vect_min_worthwhile_factor (enum tree_code);
/** In tree-vect-patterns.c **/ /* In tree-vect-slp.c. */
extern void vect_free_slp_instance (slp_instance);
extern bool vect_transform_slp_perm_load (gimple, VEC (tree, heap) *,
gimple_stmt_iterator *, int,
slp_instance, bool);
extern bool vect_schedule_slp (loop_vec_info);
extern void vect_update_slp_costs_according_to_vf (loop_vec_info);
extern bool vect_analyze_slp (loop_vec_info);
extern void vect_make_slp_decision (loop_vec_info);
extern void vect_detect_hybrid_slp (loop_vec_info);
extern void vect_get_slp_defs (slp_tree, VEC (tree,heap) **,
VEC (tree,heap) **);
/* In tree-vect-patterns.c. */
/* Pattern recognition functions. /* Pattern recognition functions.
Additional pattern recognition functions can (and will) be added Additional pattern recognition functions can (and will) be added
in the future. */ in the future. */
...@@ -760,46 +815,8 @@ typedef gimple (* vect_recog_func_ptr) (gimple, tree *, tree *); ...@@ -760,46 +815,8 @@ typedef gimple (* vect_recog_func_ptr) (gimple, tree *, tree *);
#define NUM_PATTERNS 4 #define NUM_PATTERNS 4
void vect_pattern_recog (loop_vec_info); void vect_pattern_recog (loop_vec_info);
/* Vectorization debug information - in tree-vectorizer.c. */
/** In tree-vect-transform.c **/
extern bool vectorizable_load (gimple, gimple_stmt_iterator *, gimple *,
slp_tree, slp_instance);
extern bool vectorizable_store (gimple, gimple_stmt_iterator *, gimple *,
slp_tree);
extern bool vectorizable_operation (gimple, gimple_stmt_iterator *, gimple *,
slp_tree);
extern bool vectorizable_type_promotion (gimple, gimple_stmt_iterator *,
gimple *, slp_tree);
extern bool vectorizable_type_demotion (gimple, gimple_stmt_iterator *,
gimple *, slp_tree);
extern bool vectorizable_conversion (gimple, gimple_stmt_iterator *, gimple *,
slp_tree);
extern bool vectorizable_assignment (gimple, gimple_stmt_iterator *, gimple *,
slp_tree);
extern tree vectorizable_function (gimple, tree, tree);
extern bool vectorizable_call (gimple, gimple_stmt_iterator *, gimple *);
extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *);
extern bool vectorizable_live_operation (gimple, gimple_stmt_iterator *,
gimple *);
extern bool vectorizable_reduction (gimple, gimple_stmt_iterator *, gimple *);
extern bool vectorizable_induction (gimple, gimple_stmt_iterator *, gimple *);
extern int vect_estimate_min_profitable_iters (loop_vec_info);
extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
slp_tree);
extern void vect_model_store_cost (stmt_vec_info, int, enum vect_def_type,
slp_tree);
extern void vect_model_load_cost (stmt_vec_info, int, slp_tree);
extern bool vect_transform_slp_perm_load (gimple, VEC (tree, heap) *,
gimple_stmt_iterator *, int, slp_instance, bool);
/* Driver for transformation stage. */
extern void vect_transform_loop (loop_vec_info);
/*************************************************************************
Vectorization Debug Information - in tree-vectorizer.c
*************************************************************************/
extern bool vect_print_dump_info (enum verbosity_levels); extern bool vect_print_dump_info (enum verbosity_levels);
extern void vect_set_verbosity_level (const char *); extern void vect_set_verbosity_level (const char *);
extern LOC find_loop_location (struct loop *);
#endif /* GCC_TREE_VECTORIZER_H */ #endif /* GCC_TREE_VECTORIZER_H */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment