Commit bc23502b by Paolo Bonzini

re PR tree-optimization/23109 (compiler generates wrong code leading to spurious…

re PR tree-optimization/23109 (compiler generates wrong code leading to spurious division by zero with -funsafe-math-optimizations (instead of -ftrapping-math))

gcc:
2006-01-11  Paolo Bonzini  <bonzini@gnu.org>

	PR tree-optimization/23109
	PR tree-optimization/23948
	PR tree-optimization/24123

	* Makefile.in (tree-ssa-math-opts.o): Adjust dependencies.
        * tree-cfg.c (single_noncomplex_succ): New.
        * tree-flow.h (single_noncomplex_succ): Declare it.
        * tree-ssa-math-opts.c (enum place_reciprocal): Remove.
        * tree-ssa-math-opts.c (enum place_reciprocal): Remove.
        (struct occurrence, occ_head, occ_pool, is_divide_by, compute_merit,
	insert_bb, register_division_in, insert_reciprocals,
	replace_reciprocal, free_bb): New.
        (execute_cse_reciprocals_1): Rewritten.
        (execute_cse_reciprocals): Adjust calls to execute_cse_reciprocals_1.
        Do not commit any edge insertion.  Always compute dominators and
        create the allocation pool.
        * target-def.h (TARGET_MIN_DIVISIONS_FOR_RECIP_MUL): New.
	* target.h (struct gcc_target): Add min_divistions_for_recip_mul.
	* targhooks.c (default_min_divistions_for_recip_mul): New.
	* targhooks.h (default_min_divistions_for_recip_mul): New prototype.
        * passes.c (init_optimization_passes): Run recip after tree loop
        optimizations.
        * doc/tm.texi (Misc): Document TARGET_MIN_DIVISIONS_FOR_RECIP_MUL.

gcc/testsuite:
2006-01-11  Paolo Bonzini  <bonzini@gnu.org>
        
        PR tree-optimization/23109
        PR tree-optimization/23948
        PR tree-optimization/24123

        * gcc.dg/tree-ssa/recip-3.c, gcc.dg/tree-ssa/recip-4.c,
        gcc.dg/tree-ssa/recip-5.c, gcc.dg/tree-ssa/recip-6.c,
        gcc.dg/tree-ssa/recip-7.c, gcc.dg/tree-ssa/pr23109.c,
        g++.dg/tree-ssa/pr23948.C: New testcases.
        * gcc.dg/tree-ssa/recip-2.c, gcc.dg/tree-ssa/pr23234.c: Provide
	three divisions in order to do the optimization.

From-SVN: r109578
parent 4d779342
2006-01-11 Paolo Bonzini <bonzini@gnu.org>
PR tree-optimization/23109
PR tree-optimization/23948
PR tree-optimization/24123
* Makefile.in (tree-ssa-math-opts.o): Adjust dependencies.
* tree-cfg.c (single_noncomplex_succ): New.
* tree-flow.h (single_noncomplex_succ): Declare it.
* tree-ssa-math-opts.c (enum place_reciprocal): Remove.
* tree-ssa-math-opts.c (enum place_reciprocal): Remove.
(struct occurrence, occ_head, occ_pool, is_divide_by, compute_merit,
insert_bb, register_division_in, insert_reciprocals,
replace_reciprocal, free_bb): New.
(execute_cse_reciprocals_1): Rewritten.
(execute_cse_reciprocals): Adjust calls to execute_cse_reciprocals_1.
Do not commit any edge insertion. Always compute dominators and
create the allocation pool.
* target-def.h (TARGET_MIN_DIVISIONS_FOR_RECIP_MUL): New.
* target.h (struct gcc_target): Add min_divistions_for_recip_mul.
* targhooks.c (default_min_divistions_for_recip_mul): New.
* targhooks.h (default_min_divistions_for_recip_mul): New prototype.
* passes.c (init_optimization_passes): Run recip after tree loop
optimizations.
* doc/tm.texi (Misc): Document TARGET_MIN_DIVISIONS_FOR_RECIP_MUL.
2005-01-11 Danny Berlin <dberlin@dberlin.org>
Kenneth Zadeck <zadeck@naturalbridge.com>
......@@ -151,31 +177,31 @@
2006-01-10 John David Anglin <dave.anglin@nrc-cnrc.gc.ca>
PR target/20754
* pa.md: Create separate 32 and 64-bit move patterns for SI, DI, SF
and DF modes. Add alternatives to copy between general and floating
point registers to the 32-bit patterns.
* pa-64.h (SECONDARY_MEMORY_NEEDED_RTX): Delete undefine.
* pa.h (SECONDARY_MEMORY_NEEDED_RTX): Delete define.
* config/pa/pa.md: Create separate 32 and 64-bit move patterns
for SI, DI, SF and DF modes. Add alternatives to copy between
general and floating point registers to the 32-bit patterns.
* config/pa/pa-64.h (SECONDARY_MEMORY_NEEDED_RTX): Delete undefine.
* config/pa/pa.h (SECONDARY_MEMORY_NEEDED_RTX): Delete define.
(SECONDARY_MEMORY_NEEDED): Secondary memory is only needed when
generating 64-bit code.
* pa.c (output_move_double): Handle copies between general and
floating registers.
* config/pa/pa.c (output_move_double): Handle copies between general
and floating registers.
2006-01-10 Stuart Hastings <stuart@apple.com>
* gcc/config/i386/i386.md (set_got): Update.
* config/i386/i386.md (set_got): Update.
(set_got_labelled): New. (UNSPEC_LD_MPIC): New.
(builtin_setjmp_receiver): Mach-O support.
* gcc/config/i386/darwin.h (TARGET_ASM_FILE_END) Define.
* config/i386/darwin.h (TARGET_ASM_FILE_END) Define.
(GOT_SYMBOL_NAME): Define.
(FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN): New.
(TARGET_DEEP_BRANCH_PREDICTION): Remove.
* gcc/config/i386/i386.c (override_options): Revise for Darwin.
* config/i386/i386.c (override_options): Revise for Darwin.
(USE_HIDDEN_LINKONCE): Enable for Mach-O. (ix86_file_end): Mach-O
support. (darwin_x86_file_end): New. (output_set_got): Add label
parameter, revise for Mach-O. (x86_output_mi_thunk): Likewise.
* gcc/config/i386/i386-protos.h (output_set_got): Likewise.
* gcc/config/darwin.c (machopic_legitimize_pic_address): Update
* config/i386/i386-protos.h (output_set_got): Likewise.
* config/darwin.c (machopic_legitimize_pic_address): Update
regs_ever_live[].
2006-01-10 Kaz Kojima <kkojima@gcc.gnu.org>
......@@ -604,7 +630,7 @@
2006-01-03 Adrian Straetling <straetling@de.ibm.com>
* gcc/builtins.c (get_builtin_sync_mem): New function.
* builtins.c (get_builtin_sync_mem): New function.
(expand_builtin_sync_operation, expand_builtin_compare_and_swap,
expand_builtin_lock_test_and_set, expand_builtin_lock_release):
Call get_builtin_sync_mem to generate mem rtx.
......
......@@ -1970,7 +1970,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \
$(TREE_DUMP_H) tree-pass.h $(FLAGS_H) real.h $(BASIC_BLOCK_H) \
hard-reg-set.h
tree-ssa-math-opts.o : tree-ssa-math-opts.c $(TREE_FLOW_H) $(CONFIG_H) \
$(SYSTEM_H) $(TREE_H) $(TIMEVAR_H) tree-pass.h $(TM_H) $(FLAGS_H)
$(SYSTEM_H) $(TREE_H) $(TIMEVAR_H) tree-pass.h $(TM_H) $(FLAGS_H) \
alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H)
tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
$(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) tree-inline.h $(FLAGS_H) \
function.h $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
......
......@@ -8893,6 +8893,15 @@ point number to a signed fixed point number also convert validly to an
unsigned one.
@end defmac
@deftypefn {Target Hook} int TARGET_MIN_DIVISIONS_FOR_RECIP_MUL (enum machine_mode @var{mode})
When @option{-ffast-math} is in effect, GCC tries to optimize
divisions by the same divisor, by turning them into multiplications by
the reciprocal. This target hook specifies the minimum number of divisions
that should be there for GCC to perform the optimization for a variable
of mode @var{mode}. The default implementation returns 3 if the machine
has an instruction for the division, and 2 if it does not.
@end deftypefn
@defmac MOVE_MAX
The maximum number of bytes that a single instruction can move quickly
between memory and registers or between two memory locations.
......
......@@ -551,12 +551,12 @@ init_optimization_passes (void)
we add may_alias right after fold builtins
which can create arbitrary GIMPLE. */
NEXT_PASS (pass_may_alias);
NEXT_PASS (pass_cse_reciprocals);
NEXT_PASS (pass_split_crit_edges);
NEXT_PASS (pass_pre);
NEXT_PASS (pass_may_alias);
NEXT_PASS (pass_sink_code);
NEXT_PASS (pass_tree_loop);
NEXT_PASS (pass_cse_reciprocals);
NEXT_PASS (pass_reassoc);
NEXT_PASS (pass_dominator);
......
......@@ -336,6 +336,10 @@ Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
#define TARGET_SHIFT_TRUNCATION_MASK default_shift_truncation_mask
#endif
#ifndef TARGET_MIN_DIVISIONS_FOR_RECIP_MUL
#define TARGET_MIN_DIVISIONS_FOR_RECIP_MUL default_min_divisions_for_recip_mul
#endif
#ifndef TARGET_VALID_POINTER_MODE
#define TARGET_VALID_POINTER_MODE default_valid_pointer_mode
#endif
......@@ -588,6 +592,7 @@ Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
TARGET_ENCODE_SECTION_INFO, \
TARGET_STRIP_NAME_ENCODING, \
TARGET_SHIFT_TRUNCATION_MASK, \
TARGET_MIN_DIVISIONS_FOR_RECIP_MUL, \
TARGET_VALID_POINTER_MODE, \
TARGET_SCALAR_MODE_SUPPORTED_P, \
TARGET_VECTOR_MODE_SUPPORTED_P, \
......
......@@ -440,6 +440,11 @@ struct gcc_target
return the mask that they apply. Return 0 otherwise. */
unsigned HOST_WIDE_INT (* shift_truncation_mask) (enum machine_mode mode);
/* Return the number of divisions in the given MODE that should be present,
so that it is profitable to turn the division into a multiplication by
the reciprocal. */
unsigned int (* min_divisions_for_recip_mul) (enum machine_mode mode);
/* True if MODE is valid for a pointer in __attribute__((mode("MODE"))). */
bool (* valid_pointer_mode) (enum machine_mode mode);
......
......@@ -148,6 +148,14 @@ default_shift_truncation_mask (enum machine_mode mode)
return SHIFT_COUNT_TRUNCATED ? GET_MODE_BITSIZE (mode) - 1 : 0;
}
/* The default implementation of TARGET_MIN_DIVISIONS_FOR_RECIP_MUL. */
unsigned int
default_min_divisions_for_recip_mul (enum machine_mode mode ATTRIBUTE_UNUSED)
{
return have_insn_for (DIV, mode) ? 3 : 2;
}
/* Generic hook that takes a CUMULATIVE_ARGS pointer and returns true. */
bool
......
......@@ -33,6 +33,7 @@ extern bool default_pretend_outgoing_varargs_named (CUMULATIVE_ARGS *);
extern enum machine_mode default_eh_return_filter_mode (void);
extern unsigned HOST_WIDE_INT default_shift_truncation_mask
(enum machine_mode);
extern unsigned int default_min_divisions_for_recip_mul (enum machine_mode);
extern tree default_stack_protect_guard (void);
extern tree default_external_stack_protect_fail (void);
......
2006-01-11 Paolo Bonzini <bonzini@gnu.org>
PR tree-optimization/23109
PR tree-optimization/23948
PR tree-optimization/24123
* gcc.dg/tree-ssa/recip-3.c, gcc.dg/tree-ssa/recip-4.c,
gcc.dg/tree-ssa/recip-5.c, gcc.dg/tree-ssa/recip-6.c,
gcc.dg/tree-ssa/recip-7.c, gcc.dg/tree-ssa/pr23109.c,
g++.dg/tree-ssa/pr23948.C: New testcases.
* gcc.dg/tree-ssa/recip-2.c, gcc.dg/tree-ssa/pr23234.c: Provide
three divisions in order to do the optimization.
2005-01-11 Zdenek Dvorak <dvorakz@suse.cz>
PR c++/25632
/* { dg-options "-O1 -ffast-math -fdump-tree-recip" } */
/* { dg-do compile } */
struct MIOFILE {
~MIOFILE();
};
double potentially_runnable_resource_share();
void f1(double);
int make_scheduler_request(double a, double b)
{
MIOFILE mf;
double prrs = potentially_runnable_resource_share();
f1(a/prrs);
f1(1/prrs);
f1(b/prrs);
}
/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
/* { dg-do compile } */
/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-recip -fdump-tree-lim" } */
double F[2] = { 0., 0. }, e = 0.;
int main()
{
int i;
double E, W, P, d;
/* make sure the program crashes on FP exception */
unsigned short int Mask;
W = 1.;
d = 2.*e;
E = 1. - d;
for( i=0; i < 2; i++ )
if( d > 0.01 )
{
P = ( W < E ) ? (W - E)/d : (E - W)/d;
F[i] += P;
}
return 0;
}
/* LIM only performs the transformation in the no-trapping-math case. In
the future we will do it for trapping-math as well in recip, check that
this is not wrongly optimized. */
/* { dg-final { scan-tree-dump-not "reciptmp" "lim" } } */
/* { dg-final { scan-tree-dump-not "reciptmp" "recip" } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
......@@ -9,6 +9,7 @@ double
f1 (double a, double b, double c)
{
double y0;
double y1;
if (a == 0.0)
{
......@@ -16,7 +17,8 @@ f1 (double a, double b, double c)
return y0;
}
y0 = c / b;
return y0;
y1 = a / b;
return y0 * y1;
}
/* Labels may end up in the middle of a block. Also bad. */
......@@ -24,6 +26,7 @@ double
f2 (double a, double b, double c)
{
double y0;
double y1;
a_label:
another_label:
......@@ -33,7 +36,8 @@ another_label:
return y0;
}
y0 = c / b;
return y0;
y1 = a / b;
return y0 * y1;
}
/* Uses must still be dominated by their defs. */
......@@ -41,6 +45,7 @@ double
f3 (double a, double b, double c)
{
double y0;
double y1;
y0 = -c / b;
if (a == 0.0)
......@@ -48,5 +53,6 @@ f3 (double a, double b, double c)
return y0;
}
y0 = c / b;
return y0;
y1 = a / b;
return y0 * y1;
}
......@@ -10,14 +10,19 @@ float e(float a, float b, float c, float d, float e, float f)
}
/* The PHI nodes for these divisions should be combined. */
d = d / a;
e = e / a;
f = f / a;
a = a / c;
b = b / c;
return a + b + e + f;
/* This should not be left as a multiplication. */
c = 1 / c;
return a + b + c + d + e + f;
}
/* { dg-final { scan-tree-dump-times " / " 2 "recip" } } */
/* { dg-final { scan-tree-dump-times " \\* " 5 "recip" } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
/* { dg-do compile } */
/* { dg-options "-O1 -fno-trapping-math -funsafe-math-optimizations -fdump-tree-recip" } */
double F[2] = { 0.0, 0.0 }, e;
/* In this case the optimization is interesting. */
float h ()
{
int i;
double E, W, P, d;
W = 1.;
d = 2.*e;
E = 1. - d;
for( i=0; i < 2; i++ )
if( d > 0.01 )
{
P = ( W < E ) ? (W - E)/d : (E - W)/d;
F[i] += P;
}
F[0] += E / d;
}
/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
/* { dg-do compile } */
/* { dg-options "-O1 -fno-trapping-math -funsafe-math-optimizations -fdump-tree-recip" } */
/* based on the test case in pr23109 */
double F[2] = { 0., 0. }, e = 0.;
/* Nope, we cannot prove the optimization is worthwhile in this case. */
void f ()
{
int i;
double E, W, P, d;
W = 1.;
d = 2.*e;
E = 1. - d;
if( d > 0.01 )
{
P = ( W < E ) ? (W - E)/d : (E - W)/d;
F[i] += P;
}
}
/* We also cannot prove the optimization is worthwhile in this case. */
float g ()
{
int i;
double E, W, P, d;
W = 1.;
d = 2.*e;
E = 1. - d;
if( d > 0.01 )
{
P = ( W < E ) ? (W - E)/d : (E - W)/d;
F[i] += P;
}
return 1.0 / d;
}
/* { dg-final { scan-tree-dump-not "reciptmp" "recip" } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
/* { dg-options "-O1 -funsafe-math-optimizations -ftrapping-math -fdump-tree-recip -fdump-tree-optimized" } */
/* { dg-do compile } */
/* Test the reciprocal optimizations together with trapping math. */
extern int f2();
double f1(double y, double z, double w, double j, double k)
{
double b, c, d, e, f, g;
if (f2 ())
/* inserts one division here */
b = 1 / y, c = z / y, d = j / y;
else
/* one division here */
b = 3 / y, c = w / y, d = k / y;
/* and one here, that should be removed afterwards but is not right now */
e = b / y;
f = c / y;
g = d / y;
return e + f + g;
}
/* { dg-final { scan-tree-dump-times " / " 3 "recip" } } */
/* { dg-final { scan-tree-dump-times " / " 2 "optimized" { xfail *-*-* } } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
/* { dg-final { cleanup-tree-dump "optimized" } } */
/* { dg-options "-O1 -funsafe-math-optimizations -fno-trapping-math -fdump-tree-recip" } */
/* { dg-do compile } */
/* Test inserting in a block that does not contain a division. */
extern int f2();
double f1(double y, double z, double w)
{
double b, c, d, e, f;
if (g ())
b = 1 / y, c = z / y;
else
b = 3 / y, c = w / y;
d = b / y;
e = c / y;
f = 1 / y;
return d + e + f;
}
/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
/* { dg-options "-O1 -funsafe-math-optimizations -fno-trapping-math -fdump-tree-recip" } */
/* { dg-do compile } */
/* Test inserting in a block that does not contain a division. */
extern double h();
double f(int x, double z, double w)
{
double b, c, d, e, f;
double y = h ();
if (x)
b = 1 / y, c = z / y;
else
b = 3 / y, c = w / y;
d = b / y;
e = c / y;
f = 1 / y;
return d + e + f;
}
/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
/* { dg-final { cleanup-tree-dump "recip" } } */
......@@ -1389,6 +1389,30 @@ tree_merge_blocks (basic_block a, basic_block b)
}
/* Return the one of two successors of BB that is not reachable by a
reached by a complex edge, if there is one. Else, return BB. We use
this in optimizations that use post-dominators for their heuristics,
to catch the cases in C++ where function calls are involved. */
basic_block
single_noncomplex_succ (basic_block bb)
{
edge e0, e1;
if (EDGE_COUNT (bb->succs) != 2)
return bb;
e0 = EDGE_SUCC (bb, 0);
e1 = EDGE_SUCC (bb, 1);
if (e0->flags & EDGE_COMPLEX)
return e1->dest;
if (e1->flags & EDGE_COMPLEX)
return e0->dest;
return bb;
}
/* Walk the function tree removing unnecessary statements.
* Empty statement nodes are removed
......
......@@ -487,6 +487,7 @@ extern bool is_ctrl_stmt (tree);
extern bool is_ctrl_altering_stmt (tree);
extern bool computed_goto_p (tree);
extern bool simple_goto_p (tree);
extern basic_block single_noncomplex_succ (basic_block bb);
extern void tree_dump_bb (basic_block, FILE *, int);
extern void debug_tree_bb (basic_block);
extern basic_block debug_tree_bb_n (int);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment