Commit 370c2ebe by Richard Sandiford Committed by Richard Sandiford

[14/n] PR85694: Rework overwidening detection

This patch is the main part of PR85694.  The aim is to recognise at least:

  signed char *a, *b, *c;
  ...
  for (int i = 0; i < 2048; i++)
    c[i] = (a[i] + b[i]) >> 1;

as an over-widening pattern, since the addition and shift can be done
on shorts rather than ints.  However, it ended up being a lot more
general than that.

The current over-widening pattern detection is limited to a few simple
cases: logical ops with immediate second operands, and shifts by a
constant.  These cases are enough for common pixel-format conversion
and can be detected in a peephole way.

The loop above requires two generalisations of the current code: support
for addition as well as logical ops, and support for non-constant second
operands.  These are harder to detect in the same peephole way, so the
patch tries to take a more global approach.

The idea is to get information about the minimum operation width
in two ways:

(1) by using the range information attached to the SSA_NAMEs
    (effectively a forward walk, since the range info is
    context-independent).

(2) by back-propagating the number of output bits required by
    users of the result.

As explained in the comments, there's a balance to be struck between
narrowing an individual operation and fitting in with the surrounding
code.  The approach is pretty conservative: if we could narrow an
operation to N bits without changing its semantics, it's OK to do that if:

- no operations later in the chain require more than N bits; or

- all internally-defined inputs are extended from N bits or fewer,
  and at least one of them is single-use.

See the comments for the rationale.

I didn't bother adding STMT_VINFO_* wrappers for the new fields
since the code seemed more readable without.

2018-06-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* poly-int.h (print_hex): New function.
	* dumpfile.h (dump_dec, dump_hex): Declare.
	* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
	* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
	min_input_precision, operation_precision and operation_sign.
	* tree-vect-patterns.c (vect_get_range_info): New function.
	(vect_same_loop_or_bb_p, vect_single_imm_use)
	(vect_operation_fits_smaller_type): Delete.
	(vect_look_through_possible_promotion): Add an optional
	single_use_p parameter.
	(vect_recog_over_widening_pattern): Rewrite to use new
	stmt_vec_info infomration.  Handle one operation at a time.
	(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
	(vect_truncatable_operation_p, vect_set_operation_type)
	(vect_set_min_input_precision): New functions.
	(vect_determine_min_output_precision_1): Likewise.
	(vect_determine_min_output_precision): Likewise.
	(vect_determine_precisions_from_range): Likewise.
	(vect_determine_precisions_from_users): Likewise.
	(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
	(vect_vect_recog_func_ptrs): Put over_widening first.
	Add cast_forwprop.
	(vect_pattern_recog): Call vect_determine_precisions.

gcc/testsuite/
	* gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
	widen_mult pattern.
	* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
	over-widening messages.
	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
	* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
	* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-5.c: Likewise.
	* gcc.dg/vect/vect-over-widen-6.c: Likewise.
	* gcc.dg/vect/vect-over-widen-7.c: Likewise.
	* gcc.dg/vect/vect-over-widen-8.c: Likewise.
	* gcc.dg/vect/vect-over-widen-9.c: Likewise.
	* gcc.dg/vect/vect-over-widen-10.c: Likewise.
	* gcc.dg/vect/vect-over-widen-11.c: Likewise.
	* gcc.dg/vect/vect-over-widen-12.c: Likewise.
	* gcc.dg/vect/vect-over-widen-13.c: Likewise.
	* gcc.dg/vect/vect-over-widen-14.c: Likewise.
	* gcc.dg/vect/vect-over-widen-15.c: Likewise.
	* gcc.dg/vect/vect-over-widen-16.c: Likewise.
	* gcc.dg/vect/vect-over-widen-17.c: Likewise.
	* gcc.dg/vect/vect-over-widen-18.c: Likewise.
	* gcc.dg/vect/vect-over-widen-19.c: Likewise.
	* gcc.dg/vect/vect-over-widen-20.c: Likewise.
	* gcc.dg/vect/vect-over-widen-21.c: Likewise.

From-SVN: r262333
parent 3239dde9
2018-07-03 Richard Sandiford <richard.sandiford@arm.com> 2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* poly-int.h (print_hex): New function.
* dumpfile.h (dump_dec, dump_hex): Declare.
* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
min_input_precision, operation_precision and operation_sign.
* tree-vect-patterns.c (vect_get_range_info): New function.
(vect_same_loop_or_bb_p, vect_single_imm_use)
(vect_operation_fits_smaller_type): Delete.
(vect_look_through_possible_promotion): Add an optional
single_use_p parameter.
(vect_recog_over_widening_pattern): Rewrite to use new
stmt_vec_info infomration. Handle one operation at a time.
(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
(vect_truncatable_operation_p, vect_set_operation_type)
(vect_set_min_input_precision): New functions.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_min_output_precision): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
(vect_vect_recog_func_ptrs): Put over_widening first.
Add cast_forwprop.
(vect_pattern_recog): Call vect_determine_precisions.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* tree-vect-patterns.c (vect_mark_pattern_stmts): Remove pattern * tree-vect-patterns.c (vect_mark_pattern_stmts): Remove pattern
statements that have been replaced by further pattern statements. statements that have been replaced by further pattern statements.
(vect_pattern_recog_1): Clear STMT_VINFO_PATTERN_DEF_SEQ on failure. (vect_pattern_recog_1): Clear STMT_VINFO_PATTERN_DEF_SEQ on failure.
......
...@@ -633,6 +633,28 @@ template void dump_dec (dump_flags_t, const poly_uint64 &); ...@@ -633,6 +633,28 @@ template void dump_dec (dump_flags_t, const poly_uint64 &);
template void dump_dec (dump_flags_t, const poly_offset_int &); template void dump_dec (dump_flags_t, const poly_offset_int &);
template void dump_dec (dump_flags_t, const poly_widest_int &); template void dump_dec (dump_flags_t, const poly_widest_int &);
void
dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
{
if (dump_file && (dump_kind & pflags))
print_dec (value, dump_file, sgn);
if (alt_dump_file && (dump_kind & alt_flags))
print_dec (value, alt_dump_file, sgn);
}
/* Output VALUE in hexadecimal to appropriate dump streams. */
void
dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
{
if (dump_file && (dump_kind & pflags))
print_hex (value, dump_file);
if (alt_dump_file && (dump_kind & alt_flags))
print_hex (value, alt_dump_file);
}
/* The current dump scope-nesting depth. */ /* The current dump scope-nesting depth. */
static int dump_scope_depth; static int dump_scope_depth;
......
...@@ -439,6 +439,8 @@ extern bool enable_rtl_dump_file (void); ...@@ -439,6 +439,8 @@ extern bool enable_rtl_dump_file (void);
template<unsigned int N, typename C> template<unsigned int N, typename C>
void dump_dec (dump_flags_t, const poly_int<N, C> &); void dump_dec (dump_flags_t, const poly_int<N, C> &);
extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
extern void dump_hex (dump_flags_t, const poly_wide_int &);
/* In tree-dump.c */ /* In tree-dump.c */
extern void dump_node (const_tree, dump_flags_t, FILE *); extern void dump_node (const_tree, dump_flags_t, FILE *);
......
...@@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod<N, C> &value, FILE *file) ...@@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod<N, C> &value, FILE *file)
poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED); poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
} }
/* Use print_hex to print VALUE to FILE. */
template<unsigned int N, typename C>
void
print_hex (const poly_int_pod<N, C> &value, FILE *file)
{
if (value.is_constant ())
print_hex (value.coeffs[0], file);
else
{
fprintf (file, "[");
for (unsigned int i = 0; i < N; ++i)
{
print_hex (value.coeffs[i], file);
fputc (i == N - 1 ? ']' : ',', file);
}
}
}
/* Helper for calculating the distance between two points P1 and P2, /* Helper for calculating the distance between two points P1 and P2,
in cases where known_le (P1, P2). T1 and T2 are the types of the in cases where known_le (P1, P2). T1 and T2 are the types of the
two positions, in either order. The coefficients of P2 - P1 have two positions, in either order. The coefficients of P2 - P1 have
......
2018-07-03 Richard Sandiford <richard.sandiford@arm.com> 2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
widen_mult pattern.
* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
over-widening messages.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-5.c: Likewise.
* gcc.dg/vect/vect-over-widen-6.c: Likewise.
* gcc.dg/vect/vect-over-widen-7.c: Likewise.
* gcc.dg/vect/vect-over-widen-8.c: Likewise.
* gcc.dg/vect/vect-over-widen-9.c: Likewise.
* gcc.dg/vect/vect-over-widen-10.c: Likewise.
* gcc.dg/vect/vect-over-widen-11.c: Likewise.
* gcc.dg/vect/vect-over-widen-12.c: Likewise.
* gcc.dg/vect/vect-over-widen-13.c: Likewise.
* gcc.dg/vect/vect-over-widen-14.c: Likewise.
* gcc.dg/vect/vect-over-widen-15.c: Likewise.
* gcc.dg/vect/vect-over-widen-16.c: Likewise.
* gcc.dg/vect/vect-over-widen-17.c: Likewise.
* gcc.dg/vect/vect-over-widen-18.c: Likewise.
* gcc.dg/vect/vect-over-widen-19.c: Likewise.
* gcc.dg/vect/vect-over-widen-20.c: Likewise.
* gcc.dg/vect/vect-over-widen-21.c: Likewise.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* gcc.dg/vect/vect-mixed-size-cond-1.c: New test. * gcc.dg/vect/vect-mixed-size-cond-1.c: New test.
2018-07-02 Jim Wilson <jimw@sifive.com> 2018-07-02 Jim Wilson <jimw@sifive.com>
......
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
/* Deliberate use of signed >>. */
#define DEF_LOOP(SIGNEDNESS) \
void __attribute__ ((noipa)) \
f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
SIGNEDNESS char *restrict b, \
SIGNEDNESS char *restrict c) \
{ \
a[0] = (b[0] + c[0]) >> 1; \
a[1] = (b[1] + c[1]) >> 1; \
a[2] = (b[2] + c[2]) >> 1; \
a[3] = (b[3] + c[3]) >> 1; \
a[4] = (b[4] + c[4]) >> 1; \
a[5] = (b[5] + c[5]) >> 1; \
a[6] = (b[6] + c[6]) >> 1; \
a[7] = (b[7] + c[7]) >> 1; \
a[8] = (b[8] + c[8]) >> 1; \
a[9] = (b[9] + c[9]) >> 1; \
a[10] = (b[10] + c[10]) >> 1; \
a[11] = (b[11] + c[11]) >> 1; \
a[12] = (b[12] + c[12]) >> 1; \
a[13] = (b[13] + c[13]) >> 1; \
a[14] = (b[14] + c[14]) >> 1; \
a[15] = (b[15] + c[15]) >> 1; \
}
DEF_LOOP (signed)
DEF_LOOP (unsigned)
#define N 16
#define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C) \
{ \
SIGNEDNESS char a[N], b[N], c[N]; \
for (int i = 0; i < N; ++i) \
{ \
b[i] = BASE_B + i * 15; \
c[i] = BASE_C + i * 14; \
asm volatile ("" ::: "memory"); \
} \
f_##SIGNEDNESS (a, b, c); \
for (int i = 0; i < N; ++i) \
if (a[i] != (BASE_B + BASE_C + i * 29) >> 1) \
__builtin_abort (); \
}
int
main (void)
{
check_vect ();
TEST_LOOP (signed, -128, -120);
TEST_LOOP (unsigned, 4, 10);
return 0;
}
/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
/* Deliberate use of signed >>. */
#define DEF_LOOP(SIGNEDNESS) \
void __attribute__ ((noipa)) \
f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
SIGNEDNESS char *restrict b, \
SIGNEDNESS char c) \
{ \
a[0] = (b[0] + c) >> 1; \
a[1] = (b[1] + c) >> 1; \
a[2] = (b[2] + c) >> 1; \
a[3] = (b[3] + c) >> 1; \
a[4] = (b[4] + c) >> 1; \
a[5] = (b[5] + c) >> 1; \
a[6] = (b[6] + c) >> 1; \
a[7] = (b[7] + c) >> 1; \
a[8] = (b[8] + c) >> 1; \
a[9] = (b[9] + c) >> 1; \
a[10] = (b[10] + c) >> 1; \
a[11] = (b[11] + c) >> 1; \
a[12] = (b[12] + c) >> 1; \
a[13] = (b[13] + c) >> 1; \
a[14] = (b[14] + c) >> 1; \
a[15] = (b[15] + c) >> 1; \
}
DEF_LOOP (signed)
DEF_LOOP (unsigned)
#define N 16
#define TEST_LOOP(SIGNEDNESS, BASE_B, C) \
{ \
SIGNEDNESS char a[N], b[N], c[N]; \
for (int i = 0; i < N; ++i) \
{ \
b[i] = BASE_B + i * 15; \
asm volatile ("" ::: "memory"); \
} \
f_##SIGNEDNESS (a, b, C); \
for (int i = 0; i < N; ++i) \
if (a[i] != (BASE_B + C + i * 15) >> 1) \
__builtin_abort (); \
}
int
main (void)
{
check_vect ();
TEST_LOOP (signed, -128, -120);
TEST_LOOP (unsigned, 4, 250);
return 0;
}
/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
...@@ -58,7 +58,9 @@ int main (void) ...@@ -58,7 +58,9 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
...@@ -62,8 +62,9 @@ int main (void) ...@@ -62,8 +62,9 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-9.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short, with "res"
being extended for the store to d[i]. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c, int *restrict d)
{
for (int i = 0; i < N; ++i)
{
/* Deliberate use of signed >>. */
int res = b[i] + c[i];
a[i] = (res + (res >> 1)) >> 2;
d[i] = res;
}
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
int d[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c, d);
for (int i = 0; i < N; ++i)
{
int res = BASE_B + BASE_C + i * 9;
if (a[i] != ((res + (res >> 1)) >> 2))
__builtin_abort ();
if (d[i] != res)
__builtin_abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-11.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -120
#endif
#define N 50
/* We rely on range analysis to show that these calculations can be done
in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
for (int i = 0; i < N; ++i)
a[i] = (b[i] + c[i]) / 2;
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-13.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -120
#endif
#define N 50
/* We rely on range analysis to show that these calculations can be done
in SIGNEDNESS short, with the result being extended to int for the
store. */
void __attribute__ ((noipa))
f (int *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
for (int i = 0; i < N; ++i)
a[i] = (b[i] + c[i]) / 2;
}
int
main (void)
{
check_vect ();
int a[N];
SIGNEDNESS char b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-15.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 1024
/* This should not be treated as an over-widening pattern, even though
"(b[i] & 0xef) | 0x80)" could be done in unsigned chars. */
void __attribute__ ((noipa))
f (unsigned short *restrict a, unsigned short *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
a[i] = foo;
}
}
int
main (void)
{
check_vect ();
unsigned short a[N], b[N];
for (int i = 0; i < N; ++i)
{
a[i] = i;
b[i] = i * 3;
asm volatile ("" ::: "memory");
}
f (a, b);
for (int i = 0; i < N; ++i)
if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 1024
/* This should be treated as an over-widening pattern: we can truncate
b to unsigned char after loading it and do all the computation in
unsigned char. */
void __attribute__ ((noipa))
f (unsigned char *restrict a, unsigned short *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
a[i] = foo;
}
}
int
main (void)
{
check_vect ();
unsigned char a[N];
unsigned short b[N];
for (int i = 0; i < N; ++i)
{
a[i] = i;
b[i] = i * 3;
asm volatile ("" ::: "memory");
}
f (a, b);
for (int i = 0; i < N; ++i)
if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 111
/* This shouldn't be treated as an over-widening operation: it's better
to reuse the extensions of di and ei for di + ei than to add them
as shorts and introduce a third extension. */
void __attribute__ ((noipa))
f (unsigned int *restrict a, unsigned int *restrict b,
unsigned int *restrict c, unsigned char *restrict d,
unsigned char *restrict e)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
unsigned int di = d[i];
unsigned int ei = e[i];
a[i] = di;
b[i] = ei;
c[i] = di + ei;
}
}
int
main (void)
{
check_vect ();
unsigned int a[N], b[N], c[N];
unsigned char d[N], e[N];
for (int i = 0; i < N; ++i)
{
d[i] = i * 2 + 3;
e[i] = i + 100;
asm volatile ("" ::: "memory");
}
f (a, b, c, d, e);
for (int i = 0; i < N; ++i)
if (a[i] != i * 2 + 3
|| b[i] != i + 100
|| c[i] != i * 3 + 103)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
...@@ -57,7 +57,12 @@ int main (void) ...@@ -57,7 +57,12 @@ int main (void)
return 0; return 0;
} }
/* Final value stays in int, so no over-widening is detected at the moment. */ /* This is an over-widening even though the final result is still an int.
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */ It's better to do one vector of ops on chars and then widen than to
widen and then do 4 vectors of ops on ints. */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
...@@ -57,7 +57,12 @@ int main (void) ...@@ -57,7 +57,12 @@ int main (void)
return 0; return 0;
} }
/* Final value stays in int, so no over-widening is detected at the moment. */ /* This is an over-widening even though the final result is still an int.
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */ It's better to do one vector of ops on chars and then widen than to
widen and then do 4 vectors of ops on ints. */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 111
/* This shouldn't be treated as an over-widening operation: it's better
to reuse the extensions of di and ei for di + ei than to add them
as shorts and introduce a third extension. */
void __attribute__ ((noipa))
f (unsigned int *restrict a, unsigned int *restrict b,
unsigned int *restrict c, unsigned char *restrict d,
unsigned char *restrict e)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int di = d[i];
int ei = e[i];
a[i] = di;
b[i] = ei;
c[i] = di + ei;
}
}
int
main (void)
{
check_vect ();
unsigned int a[N], b[N], c[N];
unsigned char d[N], e[N];
for (int i = 0; i < N; ++i)
{
d[i] = i * 2 + 3;
e[i] = i + 100;
asm volatile ("" ::: "memory");
}
f (a, b, c, d, e);
for (int i = 0; i < N; ++i)
if (a[i] != i * 2 + 3
|| b[i] != i + 100
|| c[i] != i * 3 + 103)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 111
/* This shouldn't be treated as an over-widening operation: it's better
to reuse the extensions of di and ei for di + ei than to add them
as shorts and introduce a third extension. */
void __attribute__ ((noipa))
f (unsigned int *restrict a, unsigned int *restrict b,
unsigned int *restrict c, unsigned char *restrict d,
unsigned char *restrict e)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
a[i] = d[i];
b[i] = e[i];
c[i] = d[i] + e[i];
}
}
int
main (void)
{
check_vect ();
unsigned int a[N], b[N], c[N];
unsigned char d[N], e[N];
for (int i = 0; i < N; ++i)
{
d[i] = i * 2 + 3;
e[i] = i + 100;
asm volatile ("" ::: "memory");
}
f (a, b, c, d, e);
for (int i = 0; i < N; ++i)
if (a[i] != i * 2 + 3
|| b[i] != i + 100
|| c[i] != i * 3 + 103)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
...@@ -59,7 +59,9 @@ int main (void) ...@@ -59,7 +59,9 @@ int main (void)
return 0; return 0;
} }
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
...@@ -57,6 +57,9 @@ int main (void) ...@@ -57,6 +57,9 @@ int main (void)
return 0; return 0;
} }
/* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
...@@ -62,7 +62,9 @@ int main (void) ...@@ -62,7 +62,9 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
...@@ -66,8 +66,9 @@ int main (void) ...@@ -66,8 +66,9 @@ int main (void)
} }
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */ /* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
/* Deliberate use of signed >>. */
for (int i = 0; i < N; ++i)
a[i] = (b[i] + c[i]) >> 1;
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#include "vect-over-widen-5.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#define D -120
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c, SIGNEDNESS char d)
{
int promoted_d = d;
for (int i = 0; i < N; ++i)
/* Deliberate use of signed >>. */
a[i] = (b[i] + c[i] + promoted_d) >> 2;
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c, D);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#define D 251
#endif
#include "vect-over-widen-7.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
for (int i = 0; i < N; ++i)
{
/* Deliberate use of signed >>. */
int res = b[i] + c[i];
a[i] = (res + (res >> 1)) >> 2;
}
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
{
int res = BASE_B + BASE_C + i * 9;
if (a[i] != ((res + (res >> 1)) >> 2))
__builtin_abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
...@@ -43,5 +43,5 @@ int main (void) ...@@ -43,5 +43,5 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */ /* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
...@@ -47,6 +47,40 @@ along with GCC; see the file COPYING3. If not see ...@@ -47,6 +47,40 @@ along with GCC; see the file COPYING3. If not see
#include "omp-simd-clone.h" #include "omp-simd-clone.h"
#include "predict.h" #include "predict.h"
/* Return true if we have a useful VR_RANGE range for VAR, storing it
in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
static bool
vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
{
value_range_type vr_type = get_range_info (var, min_value, max_value);
wide_int nonzero = get_nonzero_bits (var);
signop sgn = TYPE_SIGN (TREE_TYPE (var));
if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
nonzero, sgn) == VR_RANGE)
{
if (dump_enabled_p ())
{
dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
dump_printf (MSG_NOTE, " has range [");
dump_hex (MSG_NOTE, *min_value);
dump_printf (MSG_NOTE, ", ");
dump_hex (MSG_NOTE, *max_value);
dump_printf (MSG_NOTE, "]\n");
}
return true;
}
else
{
if (dump_enabled_p ())
{
dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
dump_printf (MSG_NOTE, " has no range info\n");
}
return false;
}
}
/* Report that we've found an instance of pattern PATTERN in /* Report that we've found an instance of pattern PATTERN in
statement STMT. */ statement STMT. */
...@@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree otype, tree_code code, ...@@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree otype, tree_code code,
return true; return true;
} }
/* Check whether STMT2 is in the same loop or basic block as STMT1.
Which of the two applies depends on whether we're currently doing
loop-based or basic-block-based vectorization, as determined by
the vinfo_for_stmt for STMT1 (which must be defined).
If this returns true, vinfo_for_stmt for STMT2 is guaranteed
to be defined as well. */
static bool
vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
}
/* If the LHS of DEF_STMT has a single use, and that statement is
in the same loop or basic block, return it. */
static gimple *
vect_single_imm_use (gimple *def_stmt)
{
tree lhs = gimple_assign_lhs (def_stmt);
use_operand_p use_p;
gimple *use_stmt;
if (!single_imm_use (lhs, &use_p, &use_stmt))
return NULL;
if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
return NULL;
return use_stmt;
}
/* Round bit precision PRECISION up to a full element. */ /* Round bit precision PRECISION up to a full element. */
static unsigned int static unsigned int
...@@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in, ...@@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in,
is possible to convert OP' back to OP using a possible sign change is possible to convert OP' back to OP using a possible sign change
followed by a possible promotion P. Return this OP', or null if OP is followed by a possible promotion P. Return this OP', or null if OP is
not a vectorizable SSA name. If there is a promotion P, describe its not a vectorizable SSA name. If there is a promotion P, describe its
input in UNPROM, otherwise describe OP' in UNPROM. input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
have more than one user.
A successful return means that it is possible to go from OP' to OP A successful return means that it is possible to go from OP' to OP
via UNPROM. The cast from OP' to UNPROM is at most a sign change, via UNPROM. The cast from OP' to UNPROM is at most a sign change,
...@@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in, ...@@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in,
static tree static tree
vect_look_through_possible_promotion (vec_info *vinfo, tree op, vect_look_through_possible_promotion (vec_info *vinfo, tree op,
vect_unpromoted_value *unprom) vect_unpromoted_value *unprom,
bool *single_use_p = NULL)
{ {
tree res = NULL_TREE; tree res = NULL_TREE;
tree op_type = TREE_TYPE (op); tree op_type = TREE_TYPE (op);
...@@ -420,7 +423,14 @@ vect_look_through_possible_promotion (vec_info *vinfo, tree op, ...@@ -420,7 +423,14 @@ vect_look_through_possible_promotion (vec_info *vinfo, tree op,
if (!def_stmt) if (!def_stmt)
break; break;
if (dt == vect_internal_def) if (dt == vect_internal_def)
{
caster = vinfo_for_stmt (def_stmt); caster = vinfo_for_stmt (def_stmt);
/* Ignore pattern statements, since we don't link uses for them. */
if (single_use_p
&& !STMT_VINFO_RELATED_STMT (caster)
&& !has_single_use (res))
*single_use_p = false;
}
else else
caster = NULL; caster = NULL;
gassign *assign = dyn_cast <gassign *> (def_stmt); gassign *assign = dyn_cast <gassign *> (def_stmt);
...@@ -1371,293 +1381,228 @@ vect_recog_widen_sum_pattern (vec<gimple *> *stmts, tree *type_out) ...@@ -1371,293 +1381,228 @@ vect_recog_widen_sum_pattern (vec<gimple *> *stmts, tree *type_out)
return pattern_stmt; return pattern_stmt;
} }
/* Recognize cases in which an operation is performed in one type WTYPE
but could be done more efficiently in a narrower type NTYPE. For example,
if we have:
/* Return TRUE if the operation in STMT can be performed on a smaller type. ATYPE a; // narrower than NTYPE
BTYPE b; // narrower than NTYPE
WTYPE aw = (WTYPE) a;
WTYPE bw = (WTYPE) b;
WTYPE res = aw + bw; // only uses of aw and bw
Input: then it would be more efficient to do:
STMT - a statement to check.
DEF - we support operations with two operands, one of which is constant.
The other operand can be defined by a demotion operation, or by a
previous statement in a sequence of over-promoted operations. In the
later case DEF is used to replace that operand. (It is defined by a
pattern statement we created for the previous statement in the
sequence).
Input/output:
NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
NULL, it's the type of DEF.
STMTS - additional pattern statements. If a pattern statement (type
conversion) is created in this function, its original statement is
added to STMTS.
Output: NTYPE an = (NTYPE) a;
OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new NTYPE bn = (NTYPE) b;
operands to use in the new pattern statement for STMT (will be created NTYPE resn = an + bn;
in vect_recog_over_widening_pattern ()). WTYPE res = (WTYPE) resn;
NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
statements for STMT: the first one is a type promotion and the second
one is the operation itself. We return the type promotion statement
in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
the second pattern statement. */
static bool Other situations include things like:
vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
tree *op0, tree *op1, gimple **new_def_stmt,
vec<gimple *> *stmts)
{
enum tree_code code;
tree const_oprnd, oprnd;
tree interm_type = NULL_TREE, half_type, new_oprnd, type;
gimple *def_stmt, *new_stmt;
bool first = false;
bool promotion;
*op0 = NULL_TREE; ATYPE a; // NTYPE or narrower
*op1 = NULL_TREE; WTYPE aw = (WTYPE) a;
*new_def_stmt = NULL; WTYPE res = aw + b;
if (!is_gimple_assign (stmt)) when only "(NTYPE) res" is significant. In that case it's more efficient
return false; to truncate "b" and do the operation on NTYPE instead:
code = gimple_assign_rhs_code (stmt); NTYPE an = (NTYPE) a;
if (code != LSHIFT_EXPR && code != RSHIFT_EXPR NTYPE bn = (NTYPE) b; // truncation
&& code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR) NTYPE resn = an + bn;
return false; WTYPE res = (WTYPE) resn;
oprnd = gimple_assign_rhs1 (stmt); All users of "res" should then use "resn" instead, making the final
const_oprnd = gimple_assign_rhs2 (stmt); statement dead (not marked as relevant). The final statement is still
type = gimple_expr_type (stmt); needed to maintain the type correctness of the IR.
if (TREE_CODE (oprnd) != SSA_NAME vect_determine_precisions has already determined the minimum
|| TREE_CODE (const_oprnd) != INTEGER_CST) precison of the operation and the minimum precision required
return false; by users of the result. */
/* If oprnd has other uses besides that in stmt we cannot mark it static gimple *
as being part of a pattern only. */ vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
if (!has_single_use (oprnd)) {
return false; gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
if (!last_stmt)
return NULL;
/* If we are in the middle of a sequence, we use DEF from a previous /* See whether we have found that this operation can be done on a
statement. Otherwise, OPRND has to be a result of type promotion. */ narrower type without changing its semantics. */
if (*new_type) stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
{ unsigned int new_precision = last_stmt_info->operation_precision;
half_type = *new_type; if (!new_precision)
oprnd = def; return NULL;
}
else
{
first = true;
if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
&promotion)
|| !promotion
|| !vect_same_loop_or_bb_p (stmt, def_stmt))
return false;
}
/* Can we perform the operation on a smaller type? */ vec_info *vinfo = last_stmt_info->vinfo;
switch (code) tree lhs = gimple_assign_lhs (last_stmt);
{ tree type = TREE_TYPE (lhs);
case BIT_IOR_EXPR: tree_code code = gimple_assign_rhs_code (last_stmt);
case BIT_XOR_EXPR:
case BIT_AND_EXPR:
if (!int_fits_type_p (const_oprnd, half_type))
{
/* HALF_TYPE is not enough. Try a bigger type if possible. */
if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
return false;
interm_type = build_nonstandard_integer_type ( /* Keep the first operand of a COND_EXPR as-is: only the other two
TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type)); operands are interesting. */
if (!int_fits_type_p (const_oprnd, interm_type)) unsigned int first_op = (code == COND_EXPR ? 2 : 1);
return false;
}
break; /* Check the operands. */
unsigned int nops = gimple_num_ops (last_stmt) - first_op;
auto_vec <vect_unpromoted_value, 3> unprom (nops);
unprom.quick_grow (nops);
unsigned int min_precision = 0;
bool single_use_p = false;
for (unsigned int i = 0; i < nops; ++i)
{
tree op = gimple_op (last_stmt, first_op + i);
if (TREE_CODE (op) == INTEGER_CST)
unprom[i].set_op (op, vect_constant_def);
else if (TREE_CODE (op) == SSA_NAME)
{
bool op_single_use_p = true;
if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
&op_single_use_p))
return NULL;
/* If:
case LSHIFT_EXPR: (1) N bits of the result are needed;
/* Try intermediate type - HALF_TYPE is not enough for sure. */ (2) all inputs are widened from M<N bits; and
if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4)) (3) one operand OP is a single-use SSA name
return false;
/* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size. we can shift the M->N widening from OP to the output
(e.g., if the original value was char, the shift amount is at most 8 without changing the number or type of extensions involved.
if we want to use short). */ This then reduces the number of copies of STMT_INFO.
if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
return false;
interm_type = build_nonstandard_integer_type ( If instead of (3) more than one operand is a single-use SSA name,
TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type)); shifting the extension to the output is even more of a win.
if (!vect_supportable_shift (code, interm_type)) If instead:
return false;
break; (1) N bits of the result are needed;
(2) one operand OP2 is widened from M2<N bits;
(3) another operand OP1 is widened from M1<M2 bits; and
(4) both OP1 and OP2 are single-use
case RSHIFT_EXPR: the choice is between:
if (vect_supportable_shift (code, half_type))
break;
/* Try intermediate type - HALF_TYPE is not supported. */ (a) truncating OP2 to M1, doing the operation on M1,
if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4)) and then widening the result to N
return false;
interm_type = build_nonstandard_integer_type ( (b) widening OP1 to M2, doing the operation on M2, and then
TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type)); widening the result to N
if (!vect_supportable_shift (code, interm_type)) Both shift the M2->N widening of the inputs to the output.
return false; (a) additionally shifts the M1->M2 widening to the output;
it requires fewer copies of STMT_INFO but requires an extra
M2->M1 truncation.
break; Which is better will depend on the complexity and cost of
STMT_INFO, which is hard to predict at this stage. However,
a clear tie-breaker in favor of (b) is the fact that the
truncation in (a) increases the length of the operation chain.
default: If instead of (4) only one of OP1 or OP2 is single-use,
gcc_unreachable (); (b) is still a win over doing the operation in N bits:
} it still shifts the M2->N widening on the single-use operand
to the output and reduces the number of STMT_INFO copies.
/* There are four possible cases: If neither operand is single-use then operating on fewer than
1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's N bits might lead to more extensions overall. Whether it does
the first statement in the sequence) or not depends on global information about the vectorization
a. The original, HALF_TYPE, is not enough - we replace the promotion region, and whether that's a good trade-off would again
from HALF_TYPE to TYPE with a promotion to INTERM_TYPE. depend on the complexity and cost of the statements involved,
b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original as well as things like register pressure that are not normally
promotion. modelled at this stage. We therefore ignore these cases
2. OPRND is defined by a pattern statement we created. and just optimize the clear single-use wins above.
a. Its type is not sufficient for the operation, we create a new stmt:
a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
this statement in NEW_DEF_STMT, and it is later put in
STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
b. OPRND is good to use in the new statement. */
if (first)
{
if (interm_type)
{
/* Replace the original type conversion HALF_TYPE->TYPE with
HALF_TYPE->INTERM_TYPE. */
if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
{
new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
/* Check if the already created pattern stmt is what we need. */
if (!is_gimple_assign (new_stmt)
|| !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
|| TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
return false;
stmts->safe_push (def_stmt); Thus we take the maximum precision of the unpromoted operands
oprnd = gimple_assign_lhs (new_stmt); and record whether any operand is single-use. */
} if (unprom[i].dt == vect_internal_def)
else
{
/* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
oprnd = gimple_assign_rhs1 (def_stmt);
new_oprnd = make_ssa_name (interm_type);
new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
stmts->safe_push (def_stmt);
oprnd = new_oprnd;
}
}
else
{ {
/* Retrieve the operand before the type promotion. */ min_precision = MAX (min_precision,
oprnd = gimple_assign_rhs1 (def_stmt); TYPE_PRECISION (unprom[i].type));
} single_use_p |= op_single_use_p;
} }
else
{
if (interm_type)
{
/* Create a type conversion HALF_TYPE->INTERM_TYPE. */
new_oprnd = make_ssa_name (interm_type);
new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
oprnd = new_oprnd;
*new_def_stmt = new_stmt;
} }
/* Otherwise, OPRND is already set. */
} }
if (interm_type) /* Although the operation could be done in operation_precision, we have
*new_type = interm_type; to balance that against introducing extra truncations or extensions.
else Calculate the minimum precision that can be handled efficiently.
*new_type = half_type;
*op0 = oprnd; The loop above determined that the operation could be handled
*op1 = fold_convert (*new_type, const_oprnd); efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
extension from the inputs to the output without introducing more
instructions, and would reduce the number of instructions required
for STMT_INFO itself.
return true; vect_determine_precisions has also determined that the result only
} needs min_output_precision bits. Truncating by a factor of N times
requires a tree of N - 1 instructions, so if TYPE is N times wider
than min_output_precision, doing the operation in TYPE and truncating
the result requires N + (N - 1) = 2N - 1 instructions per output vector.
In contrast:
- truncating the input to a unary operation and doing the operation
in the new type requires at most N - 1 + 1 = N instructions per
output vector
/* Try to find a statement or a sequence of statements that can be performed - doing the same for a binary operation requires at most
on a smaller type: (N - 1) * 2 + 1 = 2N - 1 instructions per output vector
type x_t; Both unary and binary operations require fewer instructions than
TYPE x_T, res0_T, res1_T; this if the operands were extended from a suitable truncated form.
loop: Thus there is usually nothing to lose by doing operations in
S1 x_t = *p; min_output_precision bits, but there can be something to gain. */
S2 x_T = (TYPE) x_t; if (!single_use_p)
S3 res0_T = op (x_T, C0); min_precision = last_stmt_info->min_output_precision;
S4 res1_T = op (res0_T, C1); else
S5 ... = () res1_T; - type demotion min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
constants.
Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
be 'type' or some intermediate type. For now, we expect S5 to be a type
demotion operation. We also check that S3 and S4 have only one use. */
static gimple *
vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
{
gimple *stmt = stmts->pop ();
gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
*use_stmt = NULL;
tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
bool first;
tree type = NULL;
first = true; /* Apply the minimum efficient precision we just calculated. */
while (1) if (new_precision < min_precision)
{ new_precision = min_precision;
if (!vinfo_for_stmt (stmt) if (new_precision >= TYPE_PRECISION (type))
|| STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
return NULL; return NULL;
new_def_stmt = NULL; vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
&op0, &op1, &new_def_stmt,
stmts))
{
if (first)
return NULL;
else
break;
}
/* STMT can be performed on a smaller type. Check its uses. */ *type_out = get_vectype_for_scalar_type (type);
use_stmt = vect_single_imm_use (stmt); if (!*type_out)
if (!use_stmt || !is_gimple_assign (use_stmt))
return NULL; return NULL;
/* Create pattern statement for STMT. */ /* We've found a viable pattern. Get the new type of the operation. */
vectype = get_vectype_for_scalar_type (new_type); bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
if (!vectype) tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
/* We specifically don't check here whether the target supports the
new operation, since it might be something that a later pattern
wants to rewrite anyway. If targets have a minimum element size
for some optabs, we should pattern-match smaller ops to larger ops
where beneficial. */
tree new_vectype = get_vectype_for_scalar_type (new_type);
if (!new_vectype)
return NULL; return NULL;
/* We want to collect all the statements for which we create pattern if (dump_enabled_p ())
statetments, except for the case when the last statement in the {
sequence doesn't have a corresponding pattern statement. In such dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
case we associate the last pattern statement with the last statement dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
in the sequence. Therefore, we only add the original statement to dump_printf (MSG_NOTE, " to ");
the list if we know that it is not the last. */ dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
if (prev_stmt) dump_printf (MSG_NOTE, "\n");
stmts->safe_push (prev_stmt); }
var = vect_recog_temp_ssa_var (new_type, NULL); /* Calculate the rhs operands for an operation on NEW_TYPE. */
pattern_stmt STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
= gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1); tree ops[3] = {};
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt; for (unsigned int i = 1; i < first_op; ++i)
new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt); ops[i - 1] = gimple_op (last_stmt, i);
vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
new_type, &unprom[0], new_vectype);
/* Use the operation to produce a result of type NEW_TYPE. */
tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
gimple *pattern_stmt = gimple_build_assign (new_var, code,
ops[0], ops[1], ops[2]);
gimple_set_location (pattern_stmt, gimple_location (last_stmt));
if (dump_enabled_p ()) if (dump_enabled_p ())
{ {
...@@ -1666,68 +1611,88 @@ vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out) ...@@ -1666,68 +1611,88 @@ vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0); dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
} }
type = gimple_expr_type (stmt); pattern_stmt = vect_convert_output (last_stmt_info, type,
prev_stmt = stmt; pattern_stmt, new_vectype);
stmt = use_stmt;
first = false; stmts->safe_push (last_stmt);
} return pattern_stmt;
}
/* We got a sequence. We expect it to end with a type demotion operation. /* Recognize cases in which the input to a cast is wider than its
Otherwise, we quit (for now). There are three possible cases: the output, and the input is fed by a widening operation. Fold this
conversion is to NEW_TYPE (we don't do anything), the conversion is to by removing the unnecessary intermediate widening. E.g.:
a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
NEW_TYPE differs (we create a new conversion statement). */ unsigned char a;
if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt))) unsigned int b = (unsigned int) a;
{ unsigned short c = (unsigned short) b;
use_lhs = gimple_assign_lhs (use_stmt);
use_type = TREE_TYPE (use_lhs); -->
/* Support only type demotion or signedess change. */
if (!INTEGRAL_TYPE_P (use_type) unsigned short c = (unsigned short) a;
|| TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
Although this is rare in input IR, it is an expected side-effect
of the over-widening pattern above.
This is beneficial also for integer-to-float conversions, if the
widened integer has more bits than the float, and if the unwidened
input doesn't. */
static gimple *
vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
{
/* Check for a cast, including an integer-to-float conversion. */
gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
if (!last_stmt)
return NULL;
tree_code code = gimple_assign_rhs_code (last_stmt);
if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
return NULL; return NULL;
/* Check that NEW_TYPE is not bigger than the conversion result. */ /* Make sure that the rhs is a scalar with a natural bitsize. */
if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type)) tree lhs = gimple_assign_lhs (last_stmt);
if (!lhs)
return NULL;
tree lhs_type = TREE_TYPE (lhs);
scalar_mode lhs_mode;
if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
|| !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
return NULL; return NULL;
if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type) /* Check for a narrowing operation (from a vector point of view). */
|| TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type)) tree rhs = gimple_assign_rhs1 (last_stmt);
{ tree rhs_type = TREE_TYPE (rhs);
*type_out = get_vectype_for_scalar_type (use_type); if (!INTEGRAL_TYPE_P (rhs_type)
if (!*type_out) || VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
|| TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
return NULL; return NULL;
/* Create NEW_TYPE->USE_TYPE conversion. */ /* Try to find an unpromoted input. */
new_oprnd = make_ssa_name (use_type); stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var); vec_info *vinfo = last_stmt_info->vinfo;
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt; vect_unpromoted_value unprom;
if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
|| TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
return NULL;
/* We created a pattern statement for the last statement in the /* If the bits above RHS_TYPE matter, make sure that they're the
sequence, so we don't need to associate it with the pattern same when extending from UNPROM as they are when extending from RHS. */
statement created for PREV_STMT. Therefore, we add PREV_STMT if (!INTEGRAL_TYPE_P (lhs_type)
to the list in order to mark it later in vect_pattern_recog_1. */ && TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
if (prev_stmt) return NULL;
stmts->safe_push (prev_stmt);
}
else
{
if (prev_stmt)
STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
= STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
*type_out = vectype; /* We can get the same result by casting UNPROM directly, to avoid
} the unnecessary widening and narrowing. */
vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
stmts->safe_push (use_stmt); *type_out = get_vectype_for_scalar_type (lhs_type);
} if (!*type_out)
else
/* TODO: support general case, create a conversion to the correct type. */
return NULL; return NULL;
/* Pattern detected. */ tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ()); gimple *pattern_stmt = gimple_build_assign (new_var, code, unprom.op);
gimple_set_location (pattern_stmt, gimple_location (last_stmt));
stmts->safe_push (last_stmt);
return pattern_stmt; return pattern_stmt;
} }
...@@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec<gimple *> *stmts, tree *type_out) ...@@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec<gimple *> *stmts, tree *type_out)
return pattern_stmt; return pattern_stmt;
} }
/* Return true if TYPE is a non-boolean integer type. These are the types
that we want to consider for narrowing. */
static bool
vect_narrowable_type_p (tree type)
{
return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
}
/* Return true if the operation given by CODE can be truncated to N bits
when only N bits of the output are needed. This is only true if bit N+1
of the inputs has no effect on the low N bits of the result. */
static bool
vect_truncatable_operation_p (tree_code code)
{
switch (code)
{
case PLUS_EXPR:
case MINUS_EXPR:
case MULT_EXPR:
case BIT_AND_EXPR:
case BIT_IOR_EXPR:
case BIT_XOR_EXPR:
case COND_EXPR:
return true;
default:
return false;
}
}
/* Record that STMT_INFO could be changed from operating on TYPE to
operating on a type with the precision and sign given by PRECISION
and SIGN respectively. PRECISION is an arbitrary bit precision;
it might not be a whole number of bytes. */
static void
vect_set_operation_type (stmt_vec_info stmt_info, tree type,
unsigned int precision, signop sign)
{
/* Round the precision up to a whole number of bytes. */
precision = vect_element_precision (precision);
if (precision < TYPE_PRECISION (type)
&& (!stmt_info->operation_precision
|| stmt_info->operation_precision > precision))
{
stmt_info->operation_precision = precision;
stmt_info->operation_sign = sign;
}
}
/* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
is an arbitrary bit precision; it might not be a whole number of bytes. */
static void
vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
unsigned int min_input_precision)
{
/* This operation in isolation only requires the inputs to have
MIN_INPUT_PRECISION of precision, However, that doesn't mean
that MIN_INPUT_PRECISION is a natural precision for the chain
as a whole. E.g. consider something like:
unsigned short *x, *y;
*y = ((*x & 0xf0) >> 4) | (*y << 4);
The right shift can be done on unsigned chars, and only requires the
result of "*x & 0xf0" to be done on unsigned chars. But taking that
approach would mean turning a natural chain of single-vector unsigned
short operations into one that truncates "*x" and then extends
"(*x & 0xf0) >> 4", with two vectors for each unsigned short
operation and one vector for each unsigned char operation.
This would be a significant pessimization.
Instead only propagate the maximum of this precision and the precision
required by the users of the result. This means that we don't pessimize
the case above but continue to optimize things like:
unsigned char *y;
unsigned short *x;
*y = ((*x & 0xf0) >> 4) | (*y << 4);
Here we would truncate two vectors of *x to a single vector of
unsigned chars and use single-vector unsigned char operations for
everything else, rather than doing two unsigned short copies of
"(*x & 0xf0) >> 4" and then truncating the result. */
min_input_precision = MAX (min_input_precision,
stmt_info->min_output_precision);
if (min_input_precision < TYPE_PRECISION (type)
&& (!stmt_info->min_input_precision
|| stmt_info->min_input_precision > min_input_precision))
stmt_info->min_input_precision = min_input_precision;
}
/* Subroutine of vect_determine_min_output_precision. Return true if
we can calculate a reduced number of output bits for STMT_INFO,
whose result is LHS. */
static bool
vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
{
/* Take the maximum precision required by users of the result. */
unsigned int precision = 0;
imm_use_iterator iter;
use_operand_p use;
FOR_EACH_IMM_USE_FAST (use, iter, lhs)
{
gimple *use_stmt = USE_STMT (use);
if (is_gimple_debug (use_stmt))
continue;
if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
return false;
stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
if (!use_stmt_info->min_input_precision)
return false;
precision = MAX (precision, use_stmt_info->min_input_precision);
}
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
precision);
dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
dump_printf (MSG_NOTE, " are significant\n");
}
stmt_info->min_output_precision = precision;
return true;
}
/* Calculate min_output_precision for STMT_INFO. */
static void
vect_determine_min_output_precision (stmt_vec_info stmt_info)
{
/* We're only interested in statements with a narrowable result. */
tree lhs = gimple_get_lhs (stmt_info->stmt);
if (!lhs
|| TREE_CODE (lhs) != SSA_NAME
|| !vect_narrowable_type_p (TREE_TYPE (lhs)))
return;
if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
}
/* Use range information to decide whether STMT (described by STMT_INFO)
could be done in a narrower type. This is effectively a forward
propagation, since it uses context-independent information that applies
to all users of an SSA name. */
static void
vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
{
tree lhs = gimple_assign_lhs (stmt);
if (!lhs || TREE_CODE (lhs) != SSA_NAME)
return;
tree type = TREE_TYPE (lhs);
if (!vect_narrowable_type_p (type))
return;
/* First see whether we have any useful range information for the result. */
unsigned int precision = TYPE_PRECISION (type);
signop sign = TYPE_SIGN (type);
wide_int min_value, max_value;
if (!vect_get_range_info (lhs, &min_value, &max_value))
return;
tree_code code = gimple_assign_rhs_code (stmt);
unsigned int nops = gimple_num_ops (stmt);
if (!vect_truncatable_operation_p (code))
/* Check that all relevant input operands are compatible, and update
[MIN_VALUE, MAX_VALUE] to include their ranges. */
for (unsigned int i = 1; i < nops; ++i)
{
tree op = gimple_op (stmt, i);
if (TREE_CODE (op) == INTEGER_CST)
{
/* Don't require the integer to have RHS_TYPE (which it might
not for things like shift amounts, etc.), but do require it
to fit the type. */
if (!int_fits_type_p (op, type))
return;
min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
}
else if (TREE_CODE (op) == SSA_NAME)
{
/* Ignore codes that don't take uniform arguments. */
if (!types_compatible_p (TREE_TYPE (op), type))
return;
wide_int op_min_value, op_max_value;
if (!vect_get_range_info (op, &op_min_value, &op_max_value))
return;
min_value = wi::min (min_value, op_min_value, sign);
max_value = wi::max (max_value, op_max_value, sign);
}
else
return;
}
/* Try to switch signed types for unsigned types if we can.
This is better for two reasons. First, unsigned ops tend
to be cheaper than signed ops. Second, it means that we can
handle things like:
signed char c;
int res = (int) c & 0xff00; // range [0x0000, 0xff00]
as:
signed char c;
unsigned short res_1 = (unsigned short) c & 0xff00;
int res = (int) res_1;
where the intermediate result res_1 has unsigned rather than
signed type. */
if (sign == SIGNED && !wi::neg_p (min_value))
sign = UNSIGNED;
/* See what precision is required for MIN_VALUE and MAX_VALUE. */
unsigned int precision1 = wi::min_precision (min_value, sign);
unsigned int precision2 = wi::min_precision (max_value, sign);
unsigned int value_precision = MAX (precision1, precision2);
if (value_precision >= precision)
return;
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
" without loss of precision: ",
sign == SIGNED ? "signed" : "unsigned",
value_precision);
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
}
vect_set_operation_type (stmt_info, type, value_precision, sign);
vect_set_min_input_precision (stmt_info, type, value_precision);
}
/* Use information about the users of STMT's result to decide whether
STMT (described by STMT_INFO) could be done in a narrower type.
This is effectively a backward propagation. */
static void
vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
{
tree_code code = gimple_assign_rhs_code (stmt);
unsigned int opno = (code == COND_EXPR ? 2 : 1);
tree type = TREE_TYPE (gimple_op (stmt, opno));
if (!vect_narrowable_type_p (type))
return;
unsigned int precision = TYPE_PRECISION (type);
unsigned int operation_precision, min_input_precision;
switch (code)
{
CASE_CONVERT:
/* Only the bits that contribute to the output matter. Don't change
the precision of the operation itself. */
operation_precision = precision;
min_input_precision = stmt_info->min_output_precision;
break;
case LSHIFT_EXPR:
case RSHIFT_EXPR:
{
tree shift = gimple_assign_rhs2 (stmt);
if (TREE_CODE (shift) != INTEGER_CST
|| !wi::ltu_p (wi::to_widest (shift), precision))
return;
unsigned int const_shift = TREE_INT_CST_LOW (shift);
if (code == LSHIFT_EXPR)
{
/* We need CONST_SHIFT fewer bits of the input. */
operation_precision = stmt_info->min_output_precision;
min_input_precision = (MAX (operation_precision, const_shift)
- const_shift);
}
else
{
/* We need CONST_SHIFT extra bits to do the operation. */
operation_precision = (stmt_info->min_output_precision
+ const_shift);
min_input_precision = operation_precision;
}
break;
}
default:
if (vect_truncatable_operation_p (code))
{
/* Input bit N has no effect on output bits N-1 and lower. */
operation_precision = stmt_info->min_output_precision;
min_input_precision = operation_precision;
break;
}
return;
}
if (operation_precision < precision)
{
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
" without affecting users: ",
TYPE_UNSIGNED (type) ? "unsigned" : "signed",
operation_precision);
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
}
vect_set_operation_type (stmt_info, type, operation_precision,
TYPE_SIGN (type));
}
vect_set_min_input_precision (stmt_info, type, min_input_precision);
}
/* Handle vect_determine_precisions for STMT_INFO, given that we
have already done so for the users of its result. */
void
vect_determine_stmt_precisions (stmt_vec_info stmt_info)
{
vect_determine_min_output_precision (stmt_info);
if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
{
vect_determine_precisions_from_range (stmt_info, stmt);
vect_determine_precisions_from_users (stmt_info, stmt);
}
}
/* Walk backwards through the vectorizable region to determine the
values of these fields:
- min_output_precision
- min_input_precision
- operation_precision
- operation_sign. */
void
vect_determine_precisions (vec_info *vinfo)
{
DUMP_VECT_SCOPE ("vect_determine_precisions");
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
unsigned int nbbs = loop->num_nodes;
for (unsigned int i = 0; i < nbbs; i++)
{
basic_block bb = bbs[nbbs - i - 1];
for (gimple_stmt_iterator si = gsi_last_bb (bb);
!gsi_end_p (si); gsi_prev (&si))
vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
}
}
else
{
bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
gimple_stmt_iterator si = bb_vinfo->region_end;
gimple *stmt;
do
{
if (!gsi_stmt (si))
si = gsi_last_bb (bb_vinfo->bb);
else
gsi_prev (&si);
stmt = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
vect_determine_stmt_precisions (stmt_info);
}
while (stmt != gsi_stmt (bb_vinfo->region_begin));
}
}
typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *); typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
struct vect_recog_func struct vect_recog_func
...@@ -4217,13 +4566,14 @@ struct vect_recog_func ...@@ -4217,13 +4566,14 @@ struct vect_recog_func
taken which means usually the more complex one needs to preceed the taken which means usually the more complex one needs to preceed the
less comples onex (widen_sum only after dot_prod or sad for example). */ less comples onex (widen_sum only after dot_prod or sad for example). */
static vect_recog_func vect_vect_recog_func_ptrs[] = { static vect_recog_func vect_vect_recog_func_ptrs[] = {
{ vect_recog_over_widening_pattern, "over_widening" },
{ vect_recog_cast_forwprop_pattern, "cast_forwprop" },
{ vect_recog_widen_mult_pattern, "widen_mult" }, { vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" }, { vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" }, { vect_recog_sad_pattern, "sad" },
{ vect_recog_widen_sum_pattern, "widen_sum" }, { vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" }, { vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" }, { vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_over_widening_pattern, "over_widening" },
{ vect_recog_rotate_pattern, "rotate" }, { vect_recog_rotate_pattern, "rotate" },
{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
{ vect_recog_divmod_pattern, "divmod" }, { vect_recog_divmod_pattern, "divmod" },
...@@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo) ...@@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo)
unsigned int i, j; unsigned int i, j;
auto_vec<gimple *, 1> stmts_to_replace; auto_vec<gimple *, 1> stmts_to_replace;
vect_determine_precisions (vinfo);
DUMP_VECT_SCOPE ("vect_pattern_recog"); DUMP_VECT_SCOPE ("vect_pattern_recog");
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
......
...@@ -899,6 +899,21 @@ typedef struct _stmt_vec_info { ...@@ -899,6 +899,21 @@ typedef struct _stmt_vec_info {
/* The number of scalar stmt references from active SLP instances. */ /* The number of scalar stmt references from active SLP instances. */
unsigned int num_slp_uses; unsigned int num_slp_uses;
/* If nonzero, the lhs of the statement could be truncated to this
many bits without affecting any users of the result. */
unsigned int min_output_precision;
/* If nonzero, all non-boolean input operands have the same precision,
and they could each be truncated to this many bits without changing
the result. */
unsigned int min_input_precision;
/* If OPERATION_BITS is nonzero, the statement could be performed on
an integer with the sign and number of bits given by OPERATION_SIGN
and OPERATION_BITS without changing the result. */
unsigned int operation_precision;
signop operation_sign;
} *stmt_vec_info; } *stmt_vec_info;
/* Information about a gather/scatter call. */ /* Information about a gather/scatter call. */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment