Commit 61d3cdbb by Dorit Nuzman Committed by Dorit Nuzman

tree.def (REDUC_MAX_EXPR, [...]): New tree-codes.

        * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): New
        tree-codes.
        * optabs.h (OTI_reduc_smax, OTI_reduc_umax, OTI_reduc_smin,
        OTI_reduc_umin, OTI_reduc_plus): New optabs for reduction.
        (reduc_smax_optab, reduc_umax_optab, reduc_smin_optab, reduc_umin_optab,
        reduc_plus_optab): New optabs for reduction.
        * expr.c (expand_expr_real_1): Handle new tree-codes.
        * tree-inline.c (estimate_num_insns_1): Handle new tree-codes.
        * tree-pretty-print.c (dump_generic_node, op_prio, op_symbol): Handle
        new tree-codes.
        * optabs.c (optab_for_tree_code): Handle new tree-codes.
        (init_optabs): Initialize new optabs.
        * genopinit.c (optabs): Define handlers for new optabs.

        * tree-vect-analyze.c (vect_analyze_operations): Fail vectorization in
        case of a phi that is marked as relevant. Call vectorizable_reduction.
        (vect_mark_relevant): Phis may be marked as relevant.
        (vect_mark_stmts_to_be_vectorized): The use corresponding to the
        reduction variable in a reduction stmt does not mark its defining phi
        as relevant. Update documentation accordingly.
        (vect_can_advance_ivs_p): Skip reduction phis.
        * tree-vect-transform.c (vect_get_vec_def_for_operand): Takes
        additional argument. Handle reduction.
        (vect_create_destination_var): Update call to vect_get_new_vect_var.
        Handle non-vector argument.
        (get_initial_def_for_reduction): New function.
        (vect_create_epilog_for_reduction): New function.
        (vectorizable_reduction): New function.
        (vect_get_new_vect_var): Handle new vect_var_kind.
        (vectorizable_assignment, vectorizable_operation, vectorizable_store,
        vectorizable_condition): Update call to vect_get_new_vect_var.
        (vect_transform_stmt): Call vectorizable_reduction.
        (vect_update_ivs_after_vectorizer): Skip reduction phis.
        (vect_transform_loop): Skip if stmt is both not relevant and not live.
        * tree-vectorizer.c (reduction_code_for_scalar_code): New function.
        (vect_is_simple_reduction): Was empty - added implementation.
        * tree-vectorizer.h (vect_scalar_var): New enum vect_var_kind value.
        (reduc_vec_info_type): New enum vect_def_type value.
        * config/rs6000/altivec.md (reduc_smax_v4si, reduc_smax_v4sf,
        reduc_umax_v4si, reduc_smin_v4si, reduc_umin_v4sf, reduc_smin_v4sf,
        reduc_plus_v4si, reduc_plus_v4sf): New define_expands.

        * tree-vect-analyze.c (vect_determine_vectorization_factor): Remove
        ENABLE_CHECKING around gcc_assert.
        * tree-vect-transform.c (vect_do_peeling_for_loop_bound,
        (vect_do_peeling_for_alignment, vect_transform_loop,
        vect_get_vec_def_for_operand): Likewise.

From-SVN: r101155
parent 6d409ca8
2005-06-19 Dorit Nuzman <dorit@il.ibm.com>
* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): New
tree-codes.
* optabs.h (OTI_reduc_smax, OTI_reduc_umax, OTI_reduc_smin,
OTI_reduc_umin, OTI_reduc_plus): New optabs for reduction.
(reduc_smax_optab, reduc_umax_optab, reduc_smin_optab, reduc_umin_optab,
reduc_plus_optab): New optabs for reduction.
* expr.c (expand_expr_real_1): Handle new tree-codes.
* tree-inline.c (estimate_num_insns_1): Handle new tree-codes.
* tree-pretty-print.c (dump_generic_node, op_prio, op_symbol): Handle
new tree-codes.
* optabs.c (optab_for_tree_code): Handle new tree-codes.
(init_optabs): Initialize new optabs.
* genopinit.c (optabs): Define handlers for new optabs.
* tree-vect-analyze.c (vect_analyze_operations): Fail vectorization in
case of a phi that is marked as relevant. Call vectorizable_reduction.
(vect_mark_relevant): Phis may be marked as relevant.
(vect_mark_stmts_to_be_vectorized): The use corresponding to the
reduction variable in a reduction stmt does not mark its defining phi
as relevant. Update documentation accordingly.
(vect_can_advance_ivs_p): Skip reduction phis.
* tree-vect-transform.c (vect_get_vec_def_for_operand): Takes
additional argument. Handle reduction.
(vect_create_destination_var): Update call to vect_get_new_vect_var.
Handle non-vector argument.
(get_initial_def_for_reduction): New function.
(vect_create_epilog_for_reduction): New function.
(vectorizable_reduction): New function.
(vect_get_new_vect_var): Handle new vect_var_kind.
(vectorizable_assignment, vectorizable_operation, vectorizable_store,
vectorizable_condition): Update call to vect_get_new_vect_var.
(vect_transform_stmt): Call vectorizable_reduction.
(vect_update_ivs_after_vectorizer): Skip reduction phis.
(vect_transform_loop): Skip if stmt is both not relevant and not live.
* tree-vectorizer.c (reduction_code_for_scalar_code): New function.
(vect_is_simple_reduction): Was empty - added implementation.
* tree-vectorizer.h (vect_scalar_var): New enum vect_var_kind value.
(reduc_vec_info_type): New enum vect_def_type value.
* config/rs6000/altivec.md (reduc_smax_v4si, reduc_smax_v4sf,
reduc_umax_v4si, reduc_smin_v4si, reduc_umin_v4sf, reduc_smin_v4sf,
reduc_plus_v4si, reduc_plus_v4sf): New define_expands.
* tree-vect-analyze.c (vect_determine_vectorization_factor): Remove
ENABLE_CHECKING around gcc_assert.
* tree-vect-transform.c (vect_do_peeling_for_loop_bound,
(vect_do_peeling_for_alignment, vect_transform_loop,
vect_get_vec_def_for_operand): Likewise.
2005-06-18 Joseph S. Myers <joseph@codesourcery.com>
* config/ia64/ia64.c (ia64_function_arg): Set up a PARALLEL for a
......
......@@ -1825,6 +1825,160 @@
operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
})
;; Reduction
(define_expand "reduc_smax_v4si"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SImode);
rtx vtmp2 = gen_reg_rtx (V4SImode);
rtx vtmp3 = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vsldoi_v4si (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_smaxv4si3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4si (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_smaxv4si3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_expand "reduc_smax_v4sf"
[(set (match_operand:V4SF 0 "register_operand" "=v")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SFmode);
rtx vtmp2 = gen_reg_rtx (V4SFmode);
rtx vtmp3 = gen_reg_rtx (V4SFmode);
emit_insn (gen_altivec_vsldoi_v4sf (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_smaxv4sf3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4sf (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_smaxv4sf3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_expand "reduc_umax_v4si"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SImode);
rtx vtmp2 = gen_reg_rtx (V4SImode);
rtx vtmp3 = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vsldoi_v4si (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_umaxv4si3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4si (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_umaxv4si3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_expand "reduc_smin_v4si"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SImode);
rtx vtmp2 = gen_reg_rtx (V4SImode);
rtx vtmp3 = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vsldoi_v4si (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_sminv4si3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4si (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_sminv4si3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_expand "reduc_smin_v4sf"
[(set (match_operand:V4SF 0 "register_operand" "=v")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SFmode);
rtx vtmp2 = gen_reg_rtx (V4SFmode);
rtx vtmp3 = gen_reg_rtx (V4SFmode);
emit_insn (gen_altivec_vsldoi_v4sf (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_sminv4sf3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4sf (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_sminv4sf3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_expand "reduc_umin_v4si"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SImode);
rtx vtmp2 = gen_reg_rtx (V4SImode);
rtx vtmp3 = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vsldoi_v4si (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_uminv4si3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4si (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_uminv4si3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_expand "reduc_plus_v4si"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SImode);
rtx vtmp2 = gen_reg_rtx (V4SImode);
rtx vtmp3 = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vsldoi_v4si (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_addv4si3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4si (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_addv4si3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_expand "reduc_plus_v4sf"
[(set (match_operand:V4SF 0 "register_operand" "=v")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")] 217))]
"TARGET_ALTIVEC"
"
{
rtx vtmp1 = gen_reg_rtx (V4SFmode);
rtx vtmp2 = gen_reg_rtx (V4SFmode);
rtx vtmp3 = gen_reg_rtx (V4SFmode);
emit_insn (gen_altivec_vsldoi_v4sf (vtmp1, operands[1], operands[1],
gen_rtx_CONST_INT (SImode, 8)));
emit_insn (gen_addv4sf3 (vtmp2, operands[1], vtmp1));
emit_insn (gen_altivec_vsldoi_v4sf (vtmp3, vtmp2, vtmp2,
gen_rtx_CONST_INT (SImode, 4)));
emit_insn (gen_addv4sf3 (operands[0], vtmp2, vtmp3));
DONE;
}")
(define_insn "vec_realign_load_v4sf"
[(set (match_operand:V4SF 0 "register_operand" "=v")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
......
......@@ -8356,6 +8356,16 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode,
return temp;
}
case REDUC_MAX_EXPR:
case REDUC_MIN_EXPR:
case REDUC_PLUS_EXPR:
{
op0 = expand_expr (TREE_OPERAND (exp, 0), NULL_RTX, VOIDmode, 0);
this_optab = optab_for_tree_code (code, type);
temp = expand_unop (mode, this_optab, op0, target, unsignedp);
gcc_assert (temp);
return temp;
}
default:
return lang_hooks.expand_expr (exp, original_target, tmode,
......
......@@ -198,7 +198,12 @@ static const char * const optabs[] =
"vec_init_optab->handlers[$A].insn_code = CODE_FOR_$(vec_init$a$)",
"vec_realign_load_optab->handlers[$A].insn_code = CODE_FOR_$(vec_realign_load_$a$)",
"vcond_gen_code[$A] = CODE_FOR_$(vcond$a$)",
"vcondu_gen_code[$A] = CODE_FOR_$(vcondu$a$)"
"vcondu_gen_code[$A] = CODE_FOR_$(vcondu$a$)",
"reduc_smax_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_smax_$a$)",
"reduc_umax_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_umax_$a$)",
"reduc_smin_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_smin_$a$)",
"reduc_umin_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_umin_$a$)",
"reduc_plus_optab->handlers[$A].insn_code = CODE_FOR_$(reduc_plus_$a$)"
};
static void gen_insn (rtx);
......
......@@ -294,6 +294,15 @@ optab_for_tree_code (enum tree_code code, tree type)
case REALIGN_LOAD_EXPR:
return vec_realign_load_optab;
case REDUC_MAX_EXPR:
return TYPE_UNSIGNED (type) ? reduc_umax_optab : reduc_smax_optab;
case REDUC_MIN_EXPR:
return TYPE_UNSIGNED (type) ? reduc_umin_optab : reduc_smin_optab;
case REDUC_PLUS_EXPR:
return reduc_plus_optab;
default:
break;
}
......@@ -5061,6 +5070,12 @@ init_optabs (void)
cstore_optab = init_optab (UNKNOWN);
push_optab = init_optab (UNKNOWN);
reduc_smax_optab = init_optab (UNKNOWN);
reduc_umax_optab = init_optab (UNKNOWN);
reduc_smin_optab = init_optab (UNKNOWN);
reduc_umin_optab = init_optab (UNKNOWN);
reduc_plus_optab = init_optab (UNKNOWN);
vec_extract_optab = init_optab (UNKNOWN);
vec_set_optab = init_optab (UNKNOWN);
vec_init_optab = init_optab (UNKNOWN);
......
......@@ -231,6 +231,13 @@ enum optab_index
/* Conditional add instruction. */
OTI_addcc,
/* Reduction operations on a vector operand. */
OTI_reduc_smax,
OTI_reduc_umax,
OTI_reduc_smin,
OTI_reduc_umin,
OTI_reduc_plus,
/* Set specified field of vector operand. */
OTI_vec_set,
/* Extract specified field of vector operand. */
......@@ -347,6 +354,12 @@ extern GTY(()) optab optab_table[OTI_MAX];
#define push_optab (optab_table[OTI_push])
#define addcc_optab (optab_table[OTI_addcc])
#define reduc_smax_optab (optab_table[OTI_reduc_smax])
#define reduc_umax_optab (optab_table[OTI_reduc_umax])
#define reduc_smin_optab (optab_table[OTI_reduc_smin])
#define reduc_umin_optab (optab_table[OTI_reduc_umin])
#define reduc_plus_optab (optab_table[OTI_reduc_plus])
#define vec_set_optab (optab_table[OTI_vec_set])
#define vec_extract_optab (optab_table[OTI_vec_extract])
#define vec_init_optab (optab_table[OTI_vec_init])
......
2005-06-19 Dorit Nuzman <dorit@il.ibm.com>
* lib/target-supports.exp (check_effective_target_vect_reduction): New.
* gcc.dg/vect/vect-reduc-1.c: Now vectorizable for vect_reduction
targets.
* gcc.dg/vect/vect-reduc-2.c: Likewise.
* gcc.dg/vect/vect-reduc-3.c: Likewise.
2005-06-18 Joseph S. Myers <joseph@codesourcery.com>
* gcc.target/ia64/float80-varargs-1.c: New test.
......
......@@ -7,7 +7,6 @@
#define DIFF 242
/* Test vectorization of reduction of unsigned-int. */
/* Not supported yet. */
int main1 (unsigned int x, unsigned int max_result)
{
......@@ -52,5 +51,4 @@ int main (void)
return main1 (0, 15);
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "not vectorized: unsupported use in stmt." 3 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail {! vect_reduction} } } } */
......@@ -8,7 +8,6 @@
#define DIFF 242
/* Test vectorization of reduction of signed-int. */
/* Not supported yet. */
int main1 (int x, int max_result)
{
......@@ -50,5 +49,4 @@ int main (void)
return main1 (0, 15);
}
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "not vectorized: unsupported use in stmt." 3 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail {! vect_reduction} } } } */
......@@ -8,7 +8,6 @@
/* Test vectorization of reduction of unsigned-int in the presence
of unknown-loop-bound. */
/* Not supported yet. */
int main1 (int n)
{
......@@ -37,5 +36,4 @@ int main (void)
return main1 (N-1);
}
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
/* { dg-final { scan-tree-dump-times "not vectorized: unsupported use in stmt." 1 "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail {! vect_reduction} } } } */
......@@ -988,6 +988,23 @@ proc check_effective_target_vect_int_mult { } {
return $et_vect_int_mult_saved
}
# Return 1 if the target supports vector reduction
proc check_effective_target_vect_reduction { } {
global et_vect_reduction_saved
if [info exists et_vect_reduction_saved] {
verbose "check_effective_target_vect_reduction: using cached result" 2
} else {
set et_vect_reduction_saved 0
if { [istarget powerpc*-*-*] } {
set et_vect_reduction_saved 1
}
}
verbose "check_effective_target_vect_reduction: returning $et_vect_reduction_saved" 2
return $et_vect_reduction_saved
}
# Return 1 if the target supports atomic operations on "int" and "long".
proc check_effective_target_sync_int_long { } {
......
......@@ -1736,6 +1736,10 @@ estimate_num_insns_1 (tree *tp, int *walk_subtrees, void *data)
case REALIGN_LOAD_EXPR:
case REDUC_MAX_EXPR:
case REDUC_MIN_EXPR:
case REDUC_PLUS_EXPR:
case RESX_EXPR:
*count += 1;
break;
......
......@@ -1535,6 +1535,24 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
pp_string (buffer, " > ");
break;
case REDUC_MAX_EXPR:
pp_string (buffer, " REDUC_MAX_EXPR < ");
dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
pp_string (buffer, " > ");
break;
case REDUC_MIN_EXPR:
pp_string (buffer, " REDUC_MIN_EXPR < ");
dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
pp_string (buffer, " > ");
break;
case REDUC_PLUS_EXPR:
pp_string (buffer, " REDUC_PLUS_EXPR < ");
dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
pp_string (buffer, " > ");
break;
default:
NIY;
}
......@@ -1817,6 +1835,9 @@ op_prio (tree op)
case ABS_EXPR:
case REALPART_EXPR:
case IMAGPART_EXPR:
case REDUC_MAX_EXPR:
case REDUC_MIN_EXPR:
case REDUC_PLUS_EXPR:
return 16;
case SAVE_EXPR:
......@@ -1907,6 +1928,9 @@ op_symbol (tree op)
case PLUS_EXPR:
return "+";
case REDUC_PLUS_EXPR:
return "r+";
case NEGATE_EXPR:
case MINUS_EXPR:
return "-";
......
......@@ -413,10 +413,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
else
vectorization_factor = nunits;
#ifdef ENABLE_CHECKING
gcc_assert (GET_MODE_SIZE (TYPE_MODE (scalar_type))
* vectorization_factor == UNITS_PER_SIMD_WORD);
#endif
}
}
......@@ -483,8 +481,16 @@ vect_analyze_operations (loop_vec_info loop_vinfo)
return false;
}
gcc_assert (!STMT_VINFO_RELEVANT_P (stmt_info));
}
if (STMT_VINFO_RELEVANT_P (stmt_info))
{
/* Most likely a reduction-like computation that is used
in the loop. */
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS,
LOOP_LOC (loop_vinfo)))
fprintf (vect_dump, "not vectorized: unsupported pattern.");
return false;
}
}
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
......@@ -541,7 +547,12 @@ vect_analyze_operations (loop_vec_info loop_vinfo)
if (STMT_VINFO_LIVE_P (stmt_info))
{
ok = vectorizable_live_operation (stmt, NULL, NULL);
ok = vectorizable_reduction (stmt, NULL, NULL);
if (ok)
need_to_vectorize = true;
else
ok = vectorizable_live_operation (stmt, NULL, NULL);
if (!ok)
{
......@@ -2148,13 +2159,13 @@ vect_mark_relevant (VEC(tree,heap) **worklist, tree stmt,
fprintf (vect_dump, "mark relevant %d, live %d.",relevant_p, live_p);
STMT_VINFO_LIVE_P (stmt_info) |= live_p;
STMT_VINFO_RELEVANT_P (stmt_info) |= relevant_p;
if (TREE_CODE (stmt) == PHI_NODE)
/* Don't mark as relevant because it's not going to vectorized. */
/* Don't put phi-nodes in the worklist. Phis that are marked relevant
or live will fail vectorization later on. */
return;
STMT_VINFO_RELEVANT_P (stmt_info) |= relevant_p;
if (STMT_VINFO_RELEVANT_P (stmt_info) == save_relevant_p
&& STMT_VINFO_LIVE_P (stmt_info) == save_live_p)
{
......@@ -2337,19 +2348,33 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
Exceptions:
- if USE is used only for address computations (e.g. array indexing),
(case 1)
If USE is used only for address computations (e.g. array indexing),
which does not need to be directly vectorized, then the
liveness/relevance of the respective DEF_STMT is left unchanged.
- if STMT has been identified as defining a reduction variable, then:
STMT_VINFO_LIVE_P (DEF_STMT_info) <-- false
STMT_VINFO_RELEVANT_P (DEF_STMT_info) <-- true
because even though STMT is classified as live (since it defines a
value that is used across loop iterations) and irrelevant (since it
is not used inside the loop), it will be vectorized, and therefore
the corresponding DEF_STMTs need to marked as relevant.
(case 2)
If STMT has been identified as defining a reduction variable, then
we have two cases:
(case 2.1)
The last use of STMT is the reduction-variable, which is defined
by a loop-header-phi. We don't want to mark the phi as live or
relevant (because it does not need to be vectorized, it is handled
as part of the vectorization of the reduction), so in this case we
skip the call to vect_mark_relevant.
(case 2.2)
The rest of the uses of STMT are defined in the loop body. For
the def_stmt of these uses we want to set liveness/relevance
as follows:
STMT_VINFO_LIVE_P (DEF_STMT_info) <-- false
STMT_VINFO_RELEVANT_P (DEF_STMT_info) <-- true
because even though STMT is classified as live (since it defines a
value that is used across loop iterations) and irrelevant (since it
is not used inside the loop), it will be vectorized, and therefore
the corresponding DEF_STMTs need to marked as relevant.
*/
/* case 2.2: */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
gcc_assert (!relevant_p && live_p);
......@@ -2359,42 +2384,42 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
{
/* We are only interested in uses that need to be vectorized. Uses
that are used for address computation are not considered relevant.
/* case 1: we are only interested in uses that need to be vectorized.
Uses that are used for address computation are not considered
relevant.
*/
if (exist_non_indexing_operands_for_use_p (use, stmt))
{
if (!vect_is_simple_use (use, loop_vinfo, &def_stmt, &def, &dt))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS,
LOOP_LOC (loop_vinfo)))
fprintf (vect_dump,
"not vectorized: unsupported use in stmt.");
VEC_free (tree, heap, worklist);
return false;
}
if (!exist_non_indexing_operands_for_use_p (use, stmt))
continue;
if (!def_stmt || IS_EMPTY_STMT (def_stmt))
continue;
if (!vect_is_simple_use (use, loop_vinfo, &def_stmt, &def, &dt))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS,
LOOP_LOC (loop_vinfo)))
fprintf (vect_dump, "not vectorized: unsupported use in stmt.");
VEC_free (tree, heap, worklist);
return false;
}
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "worklist: examine use %d: ", i);
print_generic_expr (vect_dump, use, TDF_SLIM);
}
if (!def_stmt || IS_EMPTY_STMT (def_stmt))
continue;
bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, bb))
continue;
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "worklist: examine use %d: ", i);
print_generic_expr (vect_dump, use, TDF_SLIM);
}
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "def_stmt: ");
print_generic_expr (vect_dump, def_stmt, TDF_SLIM);
}
bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, bb))
continue;
vect_mark_relevant (&worklist, def_stmt, relevant_p, live_p);
}
/* case 2.1: the reduction-use does not mark the defining-phi
as relevant. */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
&& TREE_CODE (def_stmt) == PHI_NODE)
continue;
vect_mark_relevant (&worklist, def_stmt, relevant_p, live_p);
}
} /* while worklist */
......@@ -2445,6 +2470,15 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
continue;
}
/* Skip reduction phis. */
if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (phi)) == vect_reduction_def)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
fprintf (vect_dump, "reduc phi. skip.");
continue;
}
/* Analyze the evolution function. */
access_fn = instantiate_parameters
......
......@@ -575,9 +575,7 @@ slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
if (!current_new_name)
continue;
}
#ifdef ENABLE_CHECKING
gcc_assert (get_current_def (current_new_name) == NULL_TREE);
#endif
set_current_def (current_new_name, PHI_RESULT (new_phi));
bitmap_set_bit (*defs, SSA_NAME_VERSION (current_new_name));
......@@ -761,9 +759,7 @@ slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
LOC loop_loc;
orig_cond = get_loop_exit_condition (loop);
#ifdef ENABLE_CHECKING
gcc_assert (orig_cond);
#endif
loop_cond_bsi = bsi_for_stmt (orig_cond);
standard_iv_increment_position (loop, &incr_bsi, &insert_after);
......@@ -1354,6 +1350,7 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo)
STMT_VINFO_VECT_STEP (res) = NULL_TREE;
STMT_VINFO_VECT_BASE_ALIGNED_P (res) = false;
STMT_VINFO_VECT_MISALIGNMENT (res) = NULL_TREE;
STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
return res;
}
......@@ -1744,9 +1741,44 @@ vect_is_simple_use (tree operand, loop_vec_info loop_vinfo, tree *def_stmt,
}
/* Function reduction_code_for_scalar_code
Input:
CODE - tree_code of a reduction operations.
Output:
REDUC_CODE - the correponding tree-code to be used to reduce the
vector of partial results into a single scalar result (which
will also reside in a vector).
Return TRUE if a corresponding REDUC_CODE was found, FALSE otherwise. */
bool
reduction_code_for_scalar_code (enum tree_code code,
enum tree_code *reduc_code)
{
switch (code)
{
case MAX_EXPR:
*reduc_code = REDUC_MAX_EXPR;
return true;
case MIN_EXPR:
*reduc_code = REDUC_MIN_EXPR;
return true;
case PLUS_EXPR:
*reduc_code = REDUC_PLUS_EXPR;
return true;
default:
return false;
}
}
/* Function vect_is_simple_reduction
TODO:
Detect a cross-iteration def-use cucle that represents a simple
reduction computation. We look for the following pattern:
......@@ -1756,18 +1788,189 @@ vect_is_simple_use (tree operand, loop_vec_info loop_vinfo, tree *def_stmt,
a2 = operation (a3, a1)
such that:
1. operation is...
2. no uses for a2 in the loop (elsewhere) */
1. operation is commutative and associative and it is safe to
change the the order of the computation.
2. no uses for a2 in the loop (a2 is used out of the loop)
3. no uses of a1 in the loop besides the reduction operation.
Condition 1 is tested here.
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
tree
vect_is_simple_reduction (struct loop *loop ATTRIBUTE_UNUSED,
tree phi ATTRIBUTE_UNUSED)
{
/* FORNOW */
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
fprintf (vect_dump, "reduction: unknown pattern.");
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
tree def_stmt, def1, def2;
enum tree_code code;
int op_type;
tree operation, op1, op2;
tree type;
if (TREE_CODE (loop_arg) != SSA_NAME)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: not ssa_name: ");
print_generic_expr (vect_dump, loop_arg, TDF_SLIM);
}
return NULL_TREE;
}
return NULL_TREE;
def_stmt = SSA_NAME_DEF_STMT (loop_arg);
if (!def_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
fprintf (vect_dump, "reduction: no def_stmt.");
return NULL_TREE;
}
if (TREE_CODE (def_stmt) != MODIFY_EXPR)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
print_generic_expr (vect_dump, def_stmt, TDF_SLIM);
}
return NULL_TREE;
}
operation = TREE_OPERAND (def_stmt, 1);
code = TREE_CODE (operation);
if (!commutative_tree_code (code) || !associative_tree_code (code))
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: not commutative/associative: ");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return NULL_TREE;
}
op_type = TREE_CODE_LENGTH (code);
if (op_type != binary_op)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: not binary operation: ");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return NULL_TREE;
}
op1 = TREE_OPERAND (operation, 0);
op2 = TREE_OPERAND (operation, 1);
if (TREE_CODE (op1) != SSA_NAME || TREE_CODE (op2) != SSA_NAME)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: uses not ssa_names: ");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return NULL_TREE;
}
/* Check that it's ok to change the order of the computation */
type = TREE_TYPE (operation);
if (type != TREE_TYPE (op1) || type != TREE_TYPE (op2))
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: multiple types: operation type: ");
print_generic_expr (vect_dump, type, TDF_SLIM);
fprintf (vect_dump, ", operands types: ");
print_generic_expr (vect_dump, TREE_TYPE (op1), TDF_SLIM);
fprintf (vect_dump, ",");
print_generic_expr (vect_dump, TREE_TYPE (op2), TDF_SLIM);
}
return NULL_TREE;
}
/* CHECKME: check for !flag_finite_math_only too? */
if (SCALAR_FLOAT_TYPE_P (type) && !flag_unsafe_math_optimizations)
{
/* Changing the order of operations changes the sematics. */
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: unsafe fp math optimization: ");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return NULL_TREE;
}
else if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type) && flag_trapv)
{
/* Changing the order of operations changes the sematics. */
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: unsafe int math optimization: ");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return NULL_TREE;
}
/* reduction is safe. we're dealing with one of the following:
1) integer arithmetic and no trapv
2) floating point arithmetic, and special flags permit this optimization.
*/
def1 = SSA_NAME_DEF_STMT (op1);
def2 = SSA_NAME_DEF_STMT (op2);
if (!def1 || !def2)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: no defs for operands: ");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return NULL_TREE;
}
if (TREE_CODE (def1) == MODIFY_EXPR
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def1))
&& def2 == phi)
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "detected reduction:");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return def_stmt;
}
else if (TREE_CODE (def2) == MODIFY_EXPR
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def2))
&& def1 == phi)
{
use_operand_p use;
ssa_op_iter iter;
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
argument). */
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "detected reduction: need to swap operands:");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
/* CHECKME */
FOR_EACH_SSA_USE_OPERAND (use, def_stmt, iter, SSA_OP_USE)
{
tree tuse = USE_FROM_PTR (use);
if (tuse == op1)
SET_USE (use, op2);
else if (tuse == op2)
SET_USE (use, op1);
}
return def_stmt;
}
else
{
if (vect_print_dump_info (REPORT_DETAILS, UNKNOWN_LOC))
{
fprintf (vect_dump, "reduction: unknown pattern.");
print_generic_expr (vect_dump, operation, TDF_SLIM);
}
return NULL_TREE;
}
}
......
......@@ -39,7 +39,8 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
/* Used for naming of new temporaries. */
enum vect_var_kind {
vect_simple_var,
vect_pointer_var
vect_pointer_var,
vect_scalar_var
};
/* Defines type of operation: unary or binary. */
......@@ -155,7 +156,8 @@ enum stmt_vec_info_type {
store_vec_info_type,
op_vec_info_type,
assignment_vec_info_type,
condition_vec_info_type
condition_vec_info_type,
reduc_vec_info_type
};
typedef struct data_reference *dr_p;
......@@ -345,6 +347,8 @@ extern tree vect_is_simple_reduction (struct loop *, tree);
extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
extern bool reduction_code_for_scalar_code (enum tree_code, enum tree_code *);
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
extern void destroy_loop_vec_info (loop_vec_info);
......@@ -363,6 +367,7 @@ extern bool vectorizable_operation (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_assignment (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_condition (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_live_operation (tree, block_stmt_iterator *, tree *);
extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
/* Driver for transformation stage. */
extern void vect_transform_loop (loop_vec_info, struct loops *);
......
......@@ -947,6 +947,16 @@ DEFTREECODE (REALIGN_LOAD_EXPR, "realign_load", tcc_expression, 3)
DEFTREECODE (TARGET_MEM_REF, "target_mem_ref", tcc_reference, 7)
/* Reduction operations.
Operations that take a vector of elements and "reduce" it to a scalar
result (e.g. summing the elements of the vector, finding the minimum over
the vector elements, etc).
Operand 0 is a vector; the first element in the vector has the result.
Operand 1 is a vector. */
DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
/*
Local variables:
mode:c
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment