Commit 0bbe50f6 by Changpeng Fang Committed by Sebastian Pop

pr44576 Avoid un-necessary prefetch analysis by distributing the cost models

2010-07-09  Changpeng Fang  <changpeng.fang@amd.com>

	PR tree-optimization/44576
	* tree-ssa-loop-prefetch.c (trip_count_to_ahead_ratio_too_small_p):
	New.  Pull out from is_loop_prefetching_profitable to implement
	the trip count to ahead ratio heuristic.
	(mem_ref_count_reasonable_p): New.  Pull out from
	is_loop_prefetching_profitable to implement the instruction to
	memory reference ratio heuristic.  Also consider not reasonable if
	the memory reference count is above a threshold (to avoid
	explosive compilation time.
	(insn_to_prefetch_ratio_too_small_p): New.  Pull out from
	is_loop_prefetching_profitable to implement the instruction to
	prefetch ratio heuristic.
	(is_loop_prefetching_profitable): Removed.
	(loop_prefetch_arrays): Distribute the cost analysis across the
	function to allow early exit of the prefetch analysis.
	is_loop_prefetching_profitable is splitted into three functions,
	with each one called as early as possible.
	(PREFETCH_MAX_MEM_REFS_PER_LOOP): New.  Threshold above which the
	number of memory references in a loop is considered too many.

From-SVN: r162023
parent 95da5cf9
2010-07-09 Changpeng Fang <changpeng.fang@amd.com>
PR tree-optimization/44576
* tree-ssa-loop-prefetch.c (trip_count_to_ahead_ratio_too_small_p):
New. Pull out from is_loop_prefetching_profitable to implement
the trip count to ahead ratio heuristic.
(mem_ref_count_reasonable_p): New. Pull out from
is_loop_prefetching_profitable to implement the instruction to
memory reference ratio heuristic. Also consider not reasonable if
the memory reference count is above a threshold (to avoid
explosive compilation time.
(insn_to_prefetch_ratio_too_small_p): New. Pull out from
is_loop_prefetching_profitable to implement the instruction to
prefetch ratio heuristic.
(is_loop_prefetching_profitable): Removed.
(loop_prefetch_arrays): Distribute the cost analysis across the
function to allow early exit of the prefetch analysis.
is_loop_prefetching_profitable is splitted into three functions,
with each one called as early as possible.
(PREFETCH_MAX_MEM_REFS_PER_LOOP): New. Threshold above which the
number of memory references in a loop is considered too many.
2010-07-09 Bernd Schmidt <bernds@codesourcery.com> 2010-07-09 Bernd Schmidt <bernds@codesourcery.com>
* reload.c (find_reloads): Don't clear badop if we have a * reload.c (find_reloads): Don't clear badop if we have a
......
...@@ -110,19 +110,29 @@ along with GCC; see the file COPYING3. If not see ...@@ -110,19 +110,29 @@ along with GCC; see the file COPYING3. If not see
prefetch instructions with guards in cases where 5) was not sufficient prefetch instructions with guards in cases where 5) was not sufficient
to satisfy the constraints? to satisfy the constraints?
The function is_loop_prefetching_profitable() implements a cost model A cost model is implemented to determine whether or not prefetching is
to determine if prefetching is profitable for a given loop. The cost profitable for a given loop. The cost model has three heuristics:
model has two heuristcs:
1. A heuristic that determines whether the given loop has enough CPU 1. Function trip_count_to_ahead_ratio_too_small_p implements a
ops that can be overlapped with cache missing memory ops. heuristic that determines whether or not the loop has too few
If not, the loop won't benefit from prefetching. This is implemented iterations (compared to ahead). Prefetching is not likely to be
by requirung the ratio between the instruction count and the mem ref beneficial if the trip count to ahead ratio is below a certain
count to be above a certain minimum. minimum.
2. A heuristic that disables prefetching in a loop with an unknown trip
count if the prefetching cost is above a certain limit. The relative 2. Function mem_ref_count_reasonable_p implements a heuristic that
prefetching cost is estimated by taking the ratio between the determines whether the given loop has enough CPU ops that can be
prefetch count and the total intruction count (this models the I-cache overlapped with cache missing memory ops. If not, the loop
cost). won't benefit from prefetching. In the implementation,
prefetching is not considered beneficial if the ratio between
the instruction count and the mem ref count is below a certain
minimum.
3. Function insn_to_prefetch_ratio_too_small_p implements a
heuristic that disables prefetching in a loop if the prefetching
cost is above a certain limit. The relative prefetching cost is
estimated by taking the ratio between the prefetch count and the
total intruction count (this models the I-cache cost).
The limits used in these heuristics are defined as parameters with The limits used in these heuristics are defined as parameters with
reasonable default values. Machine-specific default values will be reasonable default values. Machine-specific default values will be
added later. added later.
...@@ -238,6 +248,14 @@ struct mem_ref_group ...@@ -238,6 +248,14 @@ struct mem_ref_group
#define PREFETCH_MOD_TO_UNROLL_FACTOR_RATIO 4 #define PREFETCH_MOD_TO_UNROLL_FACTOR_RATIO 4
#endif #endif
/* Some of the prefetch computations have quadratic complexity. We want to
avoid huge compile times and, therefore, want to limit the amount of
memory references per loop where we consider prefetching. */
#ifndef PREFETCH_MAX_MEM_REFS_PER_LOOP
#define PREFETCH_MAX_MEM_REFS_PER_LOOP 200
#endif
/* The memory reference. */ /* The memory reference. */
struct mem_ref struct mem_ref
...@@ -1640,24 +1658,51 @@ determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs, ...@@ -1640,24 +1658,51 @@ determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs,
} }
} }
/* Do a cost-benefit analysis to determine if prefetching is profitable /* Determine whether or not the trip count to ahead ratio is too small based
for the current loop given the following parameters: on prefitablility consideration.
AHEAD: the iteration ahead distance, AHEAD: the iteration ahead distance,
EST_NITER: the estimated trip count, EST_NITER: the estimated trip count. */
static bool
trip_count_to_ahead_ratio_too_small_p (unsigned ahead, HOST_WIDE_INT est_niter)
{
/* Assume trip count to ahead ratio is big enough if the trip count could not
be estimated at compile time. */
if (est_niter < 0)
return false;
if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead))
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
"Not prefetching -- loop estimated to roll only %d times\n",
(int) est_niter);
return true;
}
return false;
}
/* Determine whether or not the number of memory references in the loop is
reasonable based on the profitablity and compilation time considerations.
NINSNS: estimated number of instructions in the loop, NINSNS: estimated number of instructions in the loop,
PREFETCH_COUNT: an estimate of the number of prefetches
MEM_REF_COUNT: total number of memory references in the loop. */ MEM_REF_COUNT: total number of memory references in the loop. */
static bool static bool
is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter, mem_ref_count_reasonable_p (unsigned ninsns, unsigned mem_ref_count)
unsigned ninsns, unsigned prefetch_count,
unsigned mem_ref_count, unsigned unroll_factor)
{ {
int insn_to_mem_ratio, insn_to_prefetch_ratio; int insn_to_mem_ratio;
if (mem_ref_count == 0) if (mem_ref_count == 0)
return false; return false;
/* Miss rate computation (is_miss_rate_acceptable) and dependence analysis
(compute_all_dependences) have high costs based on quadratic complexity.
To avoid huge compilation time, we give up prefetching if mem_ref_count
is too large. */
if (mem_ref_count > PREFETCH_MAX_MEM_REFS_PER_LOOP)
return false;
/* Prefetching improves performance by overlapping cache missing /* Prefetching improves performance by overlapping cache missing
memory accesses with CPU operations. If the loop does not have memory accesses with CPU operations. If the loop does not have
enough CPU operations to overlap with memory operations, prefetching enough CPU operations to overlap with memory operations, prefetching
...@@ -1678,6 +1723,21 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter, ...@@ -1678,6 +1723,21 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
return false; return false;
} }
return true;
}
/* Determine whether or not the instruction to prefetch ratio in the loop is
too small based on the profitablity consideration.
NINSNS: estimated number of instructions in the loop,
PREFETCH_COUNT: an estimate of the number of prefetches,
UNROLL_FACTOR: the factor to unroll the loop if prefetching. */
static bool
insn_to_prefetch_ratio_too_small_p (unsigned ninsns, unsigned prefetch_count,
unsigned unroll_factor)
{
int insn_to_prefetch_ratio;
/* Prefetching most likely causes performance degradation when the instruction /* Prefetching most likely causes performance degradation when the instruction
to prefetch ratio is too small. Too many prefetch instructions in a loop to prefetch ratio is too small. Too many prefetch instructions in a loop
may reduce the I-cache performance. may reduce the I-cache performance.
...@@ -1697,23 +1757,10 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter, ...@@ -1697,23 +1757,10 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
fprintf (dump_file, fprintf (dump_file,
"Not prefetching -- instruction to prefetch ratio (%d) too small\n", "Not prefetching -- instruction to prefetch ratio (%d) too small\n",
insn_to_prefetch_ratio); insn_to_prefetch_ratio);
return false; return true;
} }
/* Could not do further estimation if the trip count is unknown. Just assume return false;
prefetching is profitable. Too aggressive??? */
if (est_niter < 0)
return true;
if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead))
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
"Not prefetching -- loop estimated to roll only %d times\n",
(int) est_niter);
return false;
}
return true;
} }
...@@ -1738,9 +1785,31 @@ loop_prefetch_arrays (struct loop *loop) ...@@ -1738,9 +1785,31 @@ loop_prefetch_arrays (struct loop *loop)
return false; return false;
} }
/* FIXME: the time should be weighted by the probabilities of the blocks in
the loop body. */
time = tree_num_loop_insns (loop, &eni_time_weights);
if (time == 0)
return false;
ahead = (PREFETCH_LATENCY + time - 1) / time;
est_niter = estimated_loop_iterations_int (loop, false);
/* Prefetching is not likely to be profitable if the trip count to ahead
ratio is too small. */
if (trip_count_to_ahead_ratio_too_small_p (ahead, est_niter))
return false;
ninsns = tree_num_loop_insns (loop, &eni_size_weights);
/* Step 1: gather the memory references. */ /* Step 1: gather the memory references. */
refs = gather_memory_references (loop, &no_other_refs, &mem_ref_count); refs = gather_memory_references (loop, &no_other_refs, &mem_ref_count);
/* Give up prefetching if the number of memory references in the
loop is not reasonable based on profitablity and compilation time
considerations. */
if (!mem_ref_count_reasonable_p (ninsns, mem_ref_count))
goto fail;
/* Step 2: estimate the reuse effects. */ /* Step 2: estimate the reuse effects. */
prune_by_reuse (refs); prune_by_reuse (refs);
...@@ -1749,15 +1818,7 @@ loop_prefetch_arrays (struct loop *loop) ...@@ -1749,15 +1818,7 @@ loop_prefetch_arrays (struct loop *loop)
determine_loop_nest_reuse (loop, refs, no_other_refs); determine_loop_nest_reuse (loop, refs, no_other_refs);
/* Step 3: determine the ahead and unroll factor. */ /* Step 3: determine unroll factor. */
/* FIXME: the time should be weighted by the probabilities of the blocks in
the loop body. */
time = tree_num_loop_insns (loop, &eni_time_weights);
ahead = (PREFETCH_LATENCY + time - 1) / time;
est_niter = estimated_loop_iterations_int (loop, false);
ninsns = tree_num_loop_insns (loop, &eni_size_weights);
unroll_factor = determine_unroll_factor (loop, refs, ninsns, &desc, unroll_factor = determine_unroll_factor (loop, refs, ninsns, &desc,
est_niter); est_niter);
...@@ -1773,8 +1834,10 @@ loop_prefetch_arrays (struct loop *loop) ...@@ -1773,8 +1834,10 @@ loop_prefetch_arrays (struct loop *loop)
ahead, unroll_factor, est_niter, ahead, unroll_factor, est_niter,
ninsns, mem_ref_count, prefetch_count); ninsns, mem_ref_count, prefetch_count);
if (!is_loop_prefetching_profitable (ahead, est_niter, ninsns, prefetch_count, /* Prefetching is not likely to be profitable if the instruction to prefetch
mem_ref_count, unroll_factor)) ratio is too small. */
if (insn_to_prefetch_ratio_too_small_p (ninsns, prefetch_count,
unroll_factor))
goto fail; goto fail;
mark_nontemporal_stores (loop, refs); mark_nontemporal_stores (loop, refs);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment