Commit 370c2ebe by Richard Sandiford Committed by Richard Sandiford

[14/n] PR85694: Rework overwidening detection

This patch is the main part of PR85694.  The aim is to recognise at least:

  signed char *a, *b, *c;
  ...
  for (int i = 0; i < 2048; i++)
    c[i] = (a[i] + b[i]) >> 1;

as an over-widening pattern, since the addition and shift can be done
on shorts rather than ints.  However, it ended up being a lot more
general than that.

The current over-widening pattern detection is limited to a few simple
cases: logical ops with immediate second operands, and shifts by a
constant.  These cases are enough for common pixel-format conversion
and can be detected in a peephole way.

The loop above requires two generalisations of the current code: support
for addition as well as logical ops, and support for non-constant second
operands.  These are harder to detect in the same peephole way, so the
patch tries to take a more global approach.

The idea is to get information about the minimum operation width
in two ways:

(1) by using the range information attached to the SSA_NAMEs
    (effectively a forward walk, since the range info is
    context-independent).

(2) by back-propagating the number of output bits required by
    users of the result.

As explained in the comments, there's a balance to be struck between
narrowing an individual operation and fitting in with the surrounding
code.  The approach is pretty conservative: if we could narrow an
operation to N bits without changing its semantics, it's OK to do that if:

- no operations later in the chain require more than N bits; or

- all internally-defined inputs are extended from N bits or fewer,
  and at least one of them is single-use.

See the comments for the rationale.

I didn't bother adding STMT_VINFO_* wrappers for the new fields
since the code seemed more readable without.

2018-06-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* poly-int.h (print_hex): New function.
	* dumpfile.h (dump_dec, dump_hex): Declare.
	* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
	* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
	min_input_precision, operation_precision and operation_sign.
	* tree-vect-patterns.c (vect_get_range_info): New function.
	(vect_same_loop_or_bb_p, vect_single_imm_use)
	(vect_operation_fits_smaller_type): Delete.
	(vect_look_through_possible_promotion): Add an optional
	single_use_p parameter.
	(vect_recog_over_widening_pattern): Rewrite to use new
	stmt_vec_info infomration.  Handle one operation at a time.
	(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
	(vect_truncatable_operation_p, vect_set_operation_type)
	(vect_set_min_input_precision): New functions.
	(vect_determine_min_output_precision_1): Likewise.
	(vect_determine_min_output_precision): Likewise.
	(vect_determine_precisions_from_range): Likewise.
	(vect_determine_precisions_from_users): Likewise.
	(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
	(vect_vect_recog_func_ptrs): Put over_widening first.
	Add cast_forwprop.
	(vect_pattern_recog): Call vect_determine_precisions.

gcc/testsuite/
	* gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
	widen_mult pattern.
	* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
	over-widening messages.
	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
	* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
	* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
	* gcc.dg/vect/vect-over-widen-5.c: Likewise.
	* gcc.dg/vect/vect-over-widen-6.c: Likewise.
	* gcc.dg/vect/vect-over-widen-7.c: Likewise.
	* gcc.dg/vect/vect-over-widen-8.c: Likewise.
	* gcc.dg/vect/vect-over-widen-9.c: Likewise.
	* gcc.dg/vect/vect-over-widen-10.c: Likewise.
	* gcc.dg/vect/vect-over-widen-11.c: Likewise.
	* gcc.dg/vect/vect-over-widen-12.c: Likewise.
	* gcc.dg/vect/vect-over-widen-13.c: Likewise.
	* gcc.dg/vect/vect-over-widen-14.c: Likewise.
	* gcc.dg/vect/vect-over-widen-15.c: Likewise.
	* gcc.dg/vect/vect-over-widen-16.c: Likewise.
	* gcc.dg/vect/vect-over-widen-17.c: Likewise.
	* gcc.dg/vect/vect-over-widen-18.c: Likewise.
	* gcc.dg/vect/vect-over-widen-19.c: Likewise.
	* gcc.dg/vect/vect-over-widen-20.c: Likewise.
	* gcc.dg/vect/vect-over-widen-21.c: Likewise.

From-SVN: r262333
parent 3239dde9
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* poly-int.h (print_hex): New function.
* dumpfile.h (dump_dec, dump_hex): Declare.
* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
min_input_precision, operation_precision and operation_sign.
* tree-vect-patterns.c (vect_get_range_info): New function.
(vect_same_loop_or_bb_p, vect_single_imm_use)
(vect_operation_fits_smaller_type): Delete.
(vect_look_through_possible_promotion): Add an optional
single_use_p parameter.
(vect_recog_over_widening_pattern): Rewrite to use new
stmt_vec_info infomration. Handle one operation at a time.
(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
(vect_truncatable_operation_p, vect_set_operation_type)
(vect_set_min_input_precision): New functions.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_min_output_precision): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
(vect_vect_recog_func_ptrs): Put over_widening first.
Add cast_forwprop.
(vect_pattern_recog): Call vect_determine_precisions.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* tree-vect-patterns.c (vect_mark_pattern_stmts): Remove pattern
statements that have been replaced by further pattern statements.
(vect_pattern_recog_1): Clear STMT_VINFO_PATTERN_DEF_SEQ on failure.
......
......@@ -633,6 +633,28 @@ template void dump_dec (dump_flags_t, const poly_uint64 &);
template void dump_dec (dump_flags_t, const poly_offset_int &);
template void dump_dec (dump_flags_t, const poly_widest_int &);
void
dump_dec (dump_flags_t dump_kind, const poly_wide_int &value, signop sgn)
{
if (dump_file && (dump_kind & pflags))
print_dec (value, dump_file, sgn);
if (alt_dump_file && (dump_kind & alt_flags))
print_dec (value, alt_dump_file, sgn);
}
/* Output VALUE in hexadecimal to appropriate dump streams. */
void
dump_hex (dump_flags_t dump_kind, const poly_wide_int &value)
{
if (dump_file && (dump_kind & pflags))
print_hex (value, dump_file);
if (alt_dump_file && (dump_kind & alt_flags))
print_hex (value, alt_dump_file);
}
/* The current dump scope-nesting depth. */
static int dump_scope_depth;
......
......@@ -439,6 +439,8 @@ extern bool enable_rtl_dump_file (void);
template<unsigned int N, typename C>
void dump_dec (dump_flags_t, const poly_int<N, C> &);
extern void dump_dec (dump_flags_t, const poly_wide_int &, signop);
extern void dump_hex (dump_flags_t, const poly_wide_int &);
/* In tree-dump.c */
extern void dump_node (const_tree, dump_flags_t, FILE *);
......
......@@ -2420,6 +2420,25 @@ print_dec (const poly_int_pod<N, C> &value, FILE *file)
poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED);
}
/* Use print_hex to print VALUE to FILE. */
template<unsigned int N, typename C>
void
print_hex (const poly_int_pod<N, C> &value, FILE *file)
{
if (value.is_constant ())
print_hex (value.coeffs[0], file);
else
{
fprintf (file, "[");
for (unsigned int i = 0; i < N; ++i)
{
print_hex (value.coeffs[i], file);
fputc (i == N - 1 ? ']' : ',', file);
}
}
}
/* Helper for calculating the distance between two points P1 and P2,
in cases where known_le (P1, P2). T1 and T2 are the types of the
two positions, in either order. The coefficients of P2 - P1 have
......
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* gcc.dg/vect/vect-widen-mult-u8-u32.c: Check specifically for a
widen_mult pattern.
* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
over-widening messages.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-5.c: Likewise.
* gcc.dg/vect/vect-over-widen-6.c: Likewise.
* gcc.dg/vect/vect-over-widen-7.c: Likewise.
* gcc.dg/vect/vect-over-widen-8.c: Likewise.
* gcc.dg/vect/vect-over-widen-9.c: Likewise.
* gcc.dg/vect/vect-over-widen-10.c: Likewise.
* gcc.dg/vect/vect-over-widen-11.c: Likewise.
* gcc.dg/vect/vect-over-widen-12.c: Likewise.
* gcc.dg/vect/vect-over-widen-13.c: Likewise.
* gcc.dg/vect/vect-over-widen-14.c: Likewise.
* gcc.dg/vect/vect-over-widen-15.c: Likewise.
* gcc.dg/vect/vect-over-widen-16.c: Likewise.
* gcc.dg/vect/vect-over-widen-17.c: Likewise.
* gcc.dg/vect/vect-over-widen-18.c: Likewise.
* gcc.dg/vect/vect-over-widen-19.c: Likewise.
* gcc.dg/vect/vect-over-widen-20.c: Likewise.
* gcc.dg/vect/vect-over-widen-21.c: Likewise.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
* gcc.dg/vect/vect-mixed-size-cond-1.c: New test.
2018-07-02 Jim Wilson <jimw@sifive.com>
......
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
/* Deliberate use of signed >>. */
#define DEF_LOOP(SIGNEDNESS) \
void __attribute__ ((noipa)) \
f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
SIGNEDNESS char *restrict b, \
SIGNEDNESS char *restrict c) \
{ \
a[0] = (b[0] + c[0]) >> 1; \
a[1] = (b[1] + c[1]) >> 1; \
a[2] = (b[2] + c[2]) >> 1; \
a[3] = (b[3] + c[3]) >> 1; \
a[4] = (b[4] + c[4]) >> 1; \
a[5] = (b[5] + c[5]) >> 1; \
a[6] = (b[6] + c[6]) >> 1; \
a[7] = (b[7] + c[7]) >> 1; \
a[8] = (b[8] + c[8]) >> 1; \
a[9] = (b[9] + c[9]) >> 1; \
a[10] = (b[10] + c[10]) >> 1; \
a[11] = (b[11] + c[11]) >> 1; \
a[12] = (b[12] + c[12]) >> 1; \
a[13] = (b[13] + c[13]) >> 1; \
a[14] = (b[14] + c[14]) >> 1; \
a[15] = (b[15] + c[15]) >> 1; \
}
DEF_LOOP (signed)
DEF_LOOP (unsigned)
#define N 16
#define TEST_LOOP(SIGNEDNESS, BASE_B, BASE_C) \
{ \
SIGNEDNESS char a[N], b[N], c[N]; \
for (int i = 0; i < N; ++i) \
{ \
b[i] = BASE_B + i * 15; \
c[i] = BASE_C + i * 14; \
asm volatile ("" ::: "memory"); \
} \
f_##SIGNEDNESS (a, b, c); \
for (int i = 0; i < N; ++i) \
if (a[i] != (BASE_B + BASE_C + i * 29) >> 1) \
__builtin_abort (); \
}
int
main (void)
{
check_vect ();
TEST_LOOP (signed, -128, -120);
TEST_LOOP (unsigned, 4, 10);
return 0;
}
/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
/* Deliberate use of signed >>. */
#define DEF_LOOP(SIGNEDNESS) \
void __attribute__ ((noipa)) \
f_##SIGNEDNESS (SIGNEDNESS char *restrict a, \
SIGNEDNESS char *restrict b, \
SIGNEDNESS char c) \
{ \
a[0] = (b[0] + c) >> 1; \
a[1] = (b[1] + c) >> 1; \
a[2] = (b[2] + c) >> 1; \
a[3] = (b[3] + c) >> 1; \
a[4] = (b[4] + c) >> 1; \
a[5] = (b[5] + c) >> 1; \
a[6] = (b[6] + c) >> 1; \
a[7] = (b[7] + c) >> 1; \
a[8] = (b[8] + c) >> 1; \
a[9] = (b[9] + c) >> 1; \
a[10] = (b[10] + c) >> 1; \
a[11] = (b[11] + c) >> 1; \
a[12] = (b[12] + c) >> 1; \
a[13] = (b[13] + c) >> 1; \
a[14] = (b[14] + c) >> 1; \
a[15] = (b[15] + c) >> 1; \
}
DEF_LOOP (signed)
DEF_LOOP (unsigned)
#define N 16
#define TEST_LOOP(SIGNEDNESS, BASE_B, C) \
{ \
SIGNEDNESS char a[N], b[N], c[N]; \
for (int i = 0; i < N; ++i) \
{ \
b[i] = BASE_B + i * 15; \
asm volatile ("" ::: "memory"); \
} \
f_##SIGNEDNESS (a, b, C); \
for (int i = 0; i < N; ++i) \
if (a[i] != (BASE_B + C + i * 15) >> 1) \
__builtin_abort (); \
}
int
main (void)
{
check_vect ();
TEST_LOOP (signed, -128, -120);
TEST_LOOP (unsigned, 4, 250);
return 0;
}
/* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
......@@ -58,7 +58,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
......@@ -62,8 +62,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-9.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short, with "res"
being extended for the store to d[i]. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c, int *restrict d)
{
for (int i = 0; i < N; ++i)
{
/* Deliberate use of signed >>. */
int res = b[i] + c[i];
a[i] = (res + (res >> 1)) >> 2;
d[i] = res;
}
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
int d[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c, d);
for (int i = 0; i < N; ++i)
{
int res = BASE_B + BASE_C + i * 9;
if (a[i] != ((res + (res >> 1)) >> 2))
__builtin_abort ();
if (d[i] != res)
__builtin_abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-11.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -120
#endif
#define N 50
/* We rely on range analysis to show that these calculations can be done
in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
for (int i = 0; i < N; ++i)
a[i] = (b[i] + c[i]) / 2;
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-13.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* = \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -120
#endif
#define N 50
/* We rely on range analysis to show that these calculations can be done
in SIGNEDNESS short, with the result being extended to int for the
store. */
void __attribute__ ((noipa))
f (int *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
for (int i = 0; i < N; ++i)
a[i] = (b[i] + c[i]) / 2;
}
int
main (void)
{
check_vect ();
int a[N];
SIGNEDNESS char b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* / 2} "vect" } } */
/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#endif
#include "vect-over-widen-15.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump-not {vect_recog_cast_forwprop_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 1024
/* This should not be treated as an over-widening pattern, even though
"(b[i] & 0xef) | 0x80)" could be done in unsigned chars. */
void __attribute__ ((noipa))
f (unsigned short *restrict a, unsigned short *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
a[i] = foo;
}
}
int
main (void)
{
check_vect ();
unsigned short a[N], b[N];
for (int i = 0; i < N; ++i)
{
a[i] = i;
b[i] = i * 3;
asm volatile ("" ::: "memory");
}
f (a, b);
for (int i = 0; i < N; ++i)
if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 1024
/* This should be treated as an over-widening pattern: we can truncate
b to unsigned char after loading it and do all the computation in
unsigned char. */
void __attribute__ ((noipa))
f (unsigned char *restrict a, unsigned short *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
unsigned short foo = ((b[i] & 0xef) | 0x80) + (a[i] << 4);
a[i] = foo;
}
}
int
main (void)
{
check_vect ();
unsigned char a[N];
unsigned short b[N];
for (int i = 0; i < N; ++i)
{
a[i] = i;
b[i] = i * 3;
asm volatile ("" ::: "memory");
}
f (a, b);
for (int i = 0; i < N; ++i)
if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* &} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* |} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* <<} "vect" } } */
/* { dg-final { scan-tree-dump {vector[^\n]*char} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 111
/* This shouldn't be treated as an over-widening operation: it's better
to reuse the extensions of di and ei for di + ei than to add them
as shorts and introduce a third extension. */
void __attribute__ ((noipa))
f (unsigned int *restrict a, unsigned int *restrict b,
unsigned int *restrict c, unsigned char *restrict d,
unsigned char *restrict e)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
unsigned int di = d[i];
unsigned int ei = e[i];
a[i] = di;
b[i] = ei;
c[i] = di + ei;
}
}
int
main (void)
{
check_vect ();
unsigned int a[N], b[N], c[N];
unsigned char d[N], e[N];
for (int i = 0; i < N; ++i)
{
d[i] = i * 2 + 3;
e[i] = i + 100;
asm volatile ("" ::: "memory");
}
f (a, b, c, d, e);
for (int i = 0; i < N; ++i)
if (a[i] != i * 2 + 3
|| b[i] != i + 100
|| c[i] != i * 3 + 103)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
......@@ -57,7 +57,12 @@ int main (void)
return 0;
}
/* Final value stays in int, so no over-widening is detected at the moment. */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
/* This is an over-widening even though the final result is still an int.
It's better to do one vector of ops on chars and then widen than to
widen and then do 4 vectors of ops on ints. */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
......@@ -57,7 +57,12 @@ int main (void)
return 0;
}
/* Final value stays in int, so no over-widening is detected at the moment. */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
/* This is an over-widening even though the final result is still an int.
It's better to do one vector of ops on chars and then widen than to
widen and then do 4 vectors of ops on ints. */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 111
/* This shouldn't be treated as an over-widening operation: it's better
to reuse the extensions of di and ei for di + ei than to add them
as shorts and introduce a third extension. */
void __attribute__ ((noipa))
f (unsigned int *restrict a, unsigned int *restrict b,
unsigned int *restrict c, unsigned char *restrict d,
unsigned char *restrict e)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int di = d[i];
int ei = e[i];
a[i] = di;
b[i] = ei;
c[i] = di + ei;
}
}
int
main (void)
{
check_vect ();
unsigned int a[N], b[N], c[N];
unsigned char d[N], e[N];
for (int i = 0; i < N; ++i)
{
d[i] = i * 2 + 3;
e[i] = i + 100;
asm volatile ("" ::: "memory");
}
f (a, b, c, d, e);
for (int i = 0; i < N; ++i)
if (a[i] != i * 2 + 3
|| b[i] != i + 100
|| c[i] != i * 3 + 103)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#define N 111
/* This shouldn't be treated as an over-widening operation: it's better
to reuse the extensions of di and ei for di + ei than to add them
as shorts and introduce a third extension. */
void __attribute__ ((noipa))
f (unsigned int *restrict a, unsigned int *restrict b,
unsigned int *restrict c, unsigned char *restrict d,
unsigned char *restrict e)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
a[i] = d[i];
b[i] = e[i];
c[i] = d[i] + e[i];
}
}
int
main (void)
{
check_vect ();
unsigned int a[N], b[N], c[N];
unsigned char d[N], e[N];
for (int i = 0; i < N; ++i)
{
d[i] = i * 2 + 3;
e[i] = i + 100;
asm volatile ("" ::: "memory");
}
f (a, b, c, d, e);
for (int i = 0; i < N; ++i)
if (a[i] != i * 2 + 3
|| b[i] != i + 100
|| c[i] != i * 3 + 103)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: detected} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
......@@ -59,7 +59,9 @@ int main (void)
return 0;
}
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
......@@ -57,6 +57,9 @@ int main (void)
return 0;
}
/* { dg-final { scan-tree-dump "vect_recog_over_widening_pattern: detected" "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 9} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
......@@ -62,7 +62,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
......@@ -66,8 +66,9 @@ int main (void)
}
/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 3} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* << 8} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 5} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
/* Deliberate use of signed >>. */
for (int i = 0; i < N; ++i)
a[i] = (b[i] + c[i]) >> 1;
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#include "vect-over-widen-5.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#define D -120
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c, SIGNEDNESS char d)
{
int promoted_d = d;
for (int i = 0; i < N; ++i)
/* Deliberate use of signed >>. */
a[i] = (b[i] + c[i] + promoted_d) >> 2;
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c, D);
for (int i = 0; i < N; ++i)
if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
__builtin_abort ();
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#ifndef SIGNEDNESS
#define SIGNEDNESS unsigned
#define BASE_B 4
#define BASE_C 40
#define D 251
#endif
#include "vect-over-widen-7.c"
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(unsigned char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
/* { dg-require-effective-target vect_int } */
/* { dg-require-effective-target vect_shift } */
/* { dg-require-effective-target vect_pack_trunc } */
/* { dg-require-effective-target vect_unpack } */
#include "tree-vect.h"
#ifndef SIGNEDNESS
#define SIGNEDNESS signed
#define BASE_B -128
#define BASE_C -100
#endif
#define N 50
/* Both range analysis and backward propagation from the truncation show
that these calculations can be done in SIGNEDNESS short. */
void __attribute__ ((noipa))
f (SIGNEDNESS char *restrict a, SIGNEDNESS char *restrict b,
SIGNEDNESS char *restrict c)
{
for (int i = 0; i < N; ++i)
{
/* Deliberate use of signed >>. */
int res = b[i] + c[i];
a[i] = (res + (res >> 1)) >> 2;
}
}
int
main (void)
{
check_vect ();
SIGNEDNESS char a[N], b[N], c[N];
for (int i = 0; i < N; ++i)
{
b[i] = BASE_B + i * 5;
c[i] = BASE_C + i * 4;
asm volatile ("" ::: "memory");
}
f (a, b, c);
for (int i = 0; i < N; ++i)
{
int res = BASE_B + BASE_C + i * 9;
if (a[i] != ((res + (res >> 1)) >> 2))
__builtin_abort ();
}
return 0;
}
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* \+ } "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 1} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_over_widening_pattern: detected:[^\n]* >> 2} "vect" } } */
/* { dg-final { scan-tree-dump {vect_recog_cast_forwprop_pattern: detected:[^\n]* \(signed char\)} "vect" } } */
/* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */
......@@ -43,5 +43,5 @@ int main (void)
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
/* { dg-final { scan-tree-dump-times "pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
/* { dg-final { scan-tree-dump-times "widen_mult pattern recognized" 1 "vect" { target vect_widen_mult_qi_to_hi_pattern } } } */
......@@ -47,6 +47,40 @@ along with GCC; see the file COPYING3. If not see
#include "omp-simd-clone.h"
#include "predict.h"
/* Return true if we have a useful VR_RANGE range for VAR, storing it
in *MIN_VALUE and *MAX_VALUE if so. Note the range in the dump files. */
static bool
vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
{
value_range_type vr_type = get_range_info (var, min_value, max_value);
wide_int nonzero = get_nonzero_bits (var);
signop sgn = TYPE_SIGN (TREE_TYPE (var));
if (intersect_range_with_nonzero_bits (vr_type, min_value, max_value,
nonzero, sgn) == VR_RANGE)
{
if (dump_enabled_p ())
{
dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
dump_printf (MSG_NOTE, " has range [");
dump_hex (MSG_NOTE, *min_value);
dump_printf (MSG_NOTE, ", ");
dump_hex (MSG_NOTE, *max_value);
dump_printf (MSG_NOTE, "]\n");
}
return true;
}
else
{
if (dump_enabled_p ())
{
dump_generic_expr_loc (MSG_NOTE, vect_location, TDF_SLIM, var);
dump_printf (MSG_NOTE, " has no range info\n");
}
return false;
}
}
/* Report that we've found an instance of pattern PATTERN in
statement STMT. */
......@@ -190,40 +224,6 @@ vect_supportable_direct_optab_p (tree otype, tree_code code,
return true;
}
/* Check whether STMT2 is in the same loop or basic block as STMT1.
Which of the two applies depends on whether we're currently doing
loop-based or basic-block-based vectorization, as determined by
the vinfo_for_stmt for STMT1 (which must be defined).
If this returns true, vinfo_for_stmt for STMT2 is guaranteed
to be defined as well. */
static bool
vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
}
/* If the LHS of DEF_STMT has a single use, and that statement is
in the same loop or basic block, return it. */
static gimple *
vect_single_imm_use (gimple *def_stmt)
{
tree lhs = gimple_assign_lhs (def_stmt);
use_operand_p use_p;
gimple *use_stmt;
if (!single_imm_use (lhs, &use_p, &use_stmt))
return NULL;
if (!vect_same_loop_or_bb_p (def_stmt, use_stmt))
return NULL;
return use_stmt;
}
/* Round bit precision PRECISION up to a full element. */
static unsigned int
......@@ -347,7 +347,9 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in,
is possible to convert OP' back to OP using a possible sign change
followed by a possible promotion P. Return this OP', or null if OP is
not a vectorizable SSA name. If there is a promotion P, describe its
input in UNPROM, otherwise describe OP' in UNPROM.
input in UNPROM, otherwise describe OP' in UNPROM. If SINGLE_USE_P
is nonnull, set *SINGLE_USE_P to false if any of the SSA names involved
have more than one user.
A successful return means that it is possible to go from OP' to OP
via UNPROM. The cast from OP' to UNPROM is at most a sign change,
......@@ -374,7 +376,8 @@ vect_unpromoted_value::set_op (tree op_in, vect_def_type dt_in,
static tree
vect_look_through_possible_promotion (vec_info *vinfo, tree op,
vect_unpromoted_value *unprom)
vect_unpromoted_value *unprom,
bool *single_use_p = NULL)
{
tree res = NULL_TREE;
tree op_type = TREE_TYPE (op);
......@@ -420,7 +423,14 @@ vect_look_through_possible_promotion (vec_info *vinfo, tree op,
if (!def_stmt)
break;
if (dt == vect_internal_def)
{
caster = vinfo_for_stmt (def_stmt);
/* Ignore pattern statements, since we don't link uses for them. */
if (single_use_p
&& !STMT_VINFO_RELATED_STMT (caster)
&& !has_single_use (res))
*single_use_p = false;
}
else
caster = NULL;
gassign *assign = dyn_cast <gassign *> (def_stmt);
......@@ -1371,293 +1381,228 @@ vect_recog_widen_sum_pattern (vec<gimple *> *stmts, tree *type_out)
return pattern_stmt;
}
/* Recognize cases in which an operation is performed in one type WTYPE
but could be done more efficiently in a narrower type NTYPE. For example,
if we have:
/* Return TRUE if the operation in STMT can be performed on a smaller type.
ATYPE a; // narrower than NTYPE
BTYPE b; // narrower than NTYPE
WTYPE aw = (WTYPE) a;
WTYPE bw = (WTYPE) b;
WTYPE res = aw + bw; // only uses of aw and bw
Input:
STMT - a statement to check.
DEF - we support operations with two operands, one of which is constant.
The other operand can be defined by a demotion operation, or by a
previous statement in a sequence of over-promoted operations. In the
later case DEF is used to replace that operand. (It is defined by a
pattern statement we created for the previous statement in the
sequence).
Input/output:
NEW_TYPE - Output: a smaller type that we are trying to use. Input: if not
NULL, it's the type of DEF.
STMTS - additional pattern statements. If a pattern statement (type
conversion) is created in this function, its original statement is
added to STMTS.
then it would be more efficient to do:
Output:
OP0, OP1 - if the operation fits a smaller type, OP0 and OP1 are the new
operands to use in the new pattern statement for STMT (will be created
in vect_recog_over_widening_pattern ()).
NEW_DEF_STMT - in case DEF has to be promoted, we create two pattern
statements for STMT: the first one is a type promotion and the second
one is the operation itself. We return the type promotion statement
in NEW_DEF_STMT and further store it in STMT_VINFO_PATTERN_DEF_SEQ of
the second pattern statement. */
NTYPE an = (NTYPE) a;
NTYPE bn = (NTYPE) b;
NTYPE resn = an + bn;
WTYPE res = (WTYPE) resn;
static bool
vect_operation_fits_smaller_type (gimple *stmt, tree def, tree *new_type,
tree *op0, tree *op1, gimple **new_def_stmt,
vec<gimple *> *stmts)
{
enum tree_code code;
tree const_oprnd, oprnd;
tree interm_type = NULL_TREE, half_type, new_oprnd, type;
gimple *def_stmt, *new_stmt;
bool first = false;
bool promotion;
Other situations include things like:
*op0 = NULL_TREE;
*op1 = NULL_TREE;
*new_def_stmt = NULL;
ATYPE a; // NTYPE or narrower
WTYPE aw = (WTYPE) a;
WTYPE res = aw + b;
if (!is_gimple_assign (stmt))
return false;
when only "(NTYPE) res" is significant. In that case it's more efficient
to truncate "b" and do the operation on NTYPE instead:
code = gimple_assign_rhs_code (stmt);
if (code != LSHIFT_EXPR && code != RSHIFT_EXPR
&& code != BIT_IOR_EXPR && code != BIT_XOR_EXPR && code != BIT_AND_EXPR)
return false;
NTYPE an = (NTYPE) a;
NTYPE bn = (NTYPE) b; // truncation
NTYPE resn = an + bn;
WTYPE res = (WTYPE) resn;
oprnd = gimple_assign_rhs1 (stmt);
const_oprnd = gimple_assign_rhs2 (stmt);
type = gimple_expr_type (stmt);
All users of "res" should then use "resn" instead, making the final
statement dead (not marked as relevant). The final statement is still
needed to maintain the type correctness of the IR.
if (TREE_CODE (oprnd) != SSA_NAME
|| TREE_CODE (const_oprnd) != INTEGER_CST)
return false;
vect_determine_precisions has already determined the minimum
precison of the operation and the minimum precision required
by users of the result. */
/* If oprnd has other uses besides that in stmt we cannot mark it
as being part of a pattern only. */
if (!has_single_use (oprnd))
return false;
static gimple *
vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
{
gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
if (!last_stmt)
return NULL;
/* If we are in the middle of a sequence, we use DEF from a previous
statement. Otherwise, OPRND has to be a result of type promotion. */
if (*new_type)
{
half_type = *new_type;
oprnd = def;
}
else
{
first = true;
if (!type_conversion_p (oprnd, stmt, false, &half_type, &def_stmt,
&promotion)
|| !promotion
|| !vect_same_loop_or_bb_p (stmt, def_stmt))
return false;
}
/* See whether we have found that this operation can be done on a
narrower type without changing its semantics. */
stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
unsigned int new_precision = last_stmt_info->operation_precision;
if (!new_precision)
return NULL;
/* Can we perform the operation on a smaller type? */
switch (code)
{
case BIT_IOR_EXPR:
case BIT_XOR_EXPR:
case BIT_AND_EXPR:
if (!int_fits_type_p (const_oprnd, half_type))
{
/* HALF_TYPE is not enough. Try a bigger type if possible. */
if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
return false;
vec_info *vinfo = last_stmt_info->vinfo;
tree lhs = gimple_assign_lhs (last_stmt);
tree type = TREE_TYPE (lhs);
tree_code code = gimple_assign_rhs_code (last_stmt);
interm_type = build_nonstandard_integer_type (
TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
if (!int_fits_type_p (const_oprnd, interm_type))
return false;
}
/* Keep the first operand of a COND_EXPR as-is: only the other two
operands are interesting. */
unsigned int first_op = (code == COND_EXPR ? 2 : 1);
break;
/* Check the operands. */
unsigned int nops = gimple_num_ops (last_stmt) - first_op;
auto_vec <vect_unpromoted_value, 3> unprom (nops);
unprom.quick_grow (nops);
unsigned int min_precision = 0;
bool single_use_p = false;
for (unsigned int i = 0; i < nops; ++i)
{
tree op = gimple_op (last_stmt, first_op + i);
if (TREE_CODE (op) == INTEGER_CST)
unprom[i].set_op (op, vect_constant_def);
else if (TREE_CODE (op) == SSA_NAME)
{
bool op_single_use_p = true;
if (!vect_look_through_possible_promotion (vinfo, op, &unprom[i],
&op_single_use_p))
return NULL;
/* If:
case LSHIFT_EXPR:
/* Try intermediate type - HALF_TYPE is not enough for sure. */
if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
return false;
(1) N bits of the result are needed;
(2) all inputs are widened from M<N bits; and
(3) one operand OP is a single-use SSA name
/* Check that HALF_TYPE size + shift amount <= INTERM_TYPE size.
(e.g., if the original value was char, the shift amount is at most 8
if we want to use short). */
if (compare_tree_int (const_oprnd, TYPE_PRECISION (half_type)) == 1)
return false;
we can shift the M->N widening from OP to the output
without changing the number or type of extensions involved.
This then reduces the number of copies of STMT_INFO.
interm_type = build_nonstandard_integer_type (
TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
If instead of (3) more than one operand is a single-use SSA name,
shifting the extension to the output is even more of a win.
if (!vect_supportable_shift (code, interm_type))
return false;
If instead:
break;
(1) N bits of the result are needed;
(2) one operand OP2 is widened from M2<N bits;
(3) another operand OP1 is widened from M1<M2 bits; and
(4) both OP1 and OP2 are single-use
case RSHIFT_EXPR:
if (vect_supportable_shift (code, half_type))
break;
the choice is between:
/* Try intermediate type - HALF_TYPE is not supported. */
if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
return false;
(a) truncating OP2 to M1, doing the operation on M1,
and then widening the result to N
interm_type = build_nonstandard_integer_type (
TYPE_PRECISION (half_type) * 2, TYPE_UNSIGNED (type));
(b) widening OP1 to M2, doing the operation on M2, and then
widening the result to N
if (!vect_supportable_shift (code, interm_type))
return false;
Both shift the M2->N widening of the inputs to the output.
(a) additionally shifts the M1->M2 widening to the output;
it requires fewer copies of STMT_INFO but requires an extra
M2->M1 truncation.
break;
Which is better will depend on the complexity and cost of
STMT_INFO, which is hard to predict at this stage. However,
a clear tie-breaker in favor of (b) is the fact that the
truncation in (a) increases the length of the operation chain.
default:
gcc_unreachable ();
}
If instead of (4) only one of OP1 or OP2 is single-use,
(b) is still a win over doing the operation in N bits:
it still shifts the M2->N widening on the single-use operand
to the output and reduces the number of STMT_INFO copies.
/* There are four possible cases:
1. OPRND is defined by a type promotion (in that case FIRST is TRUE, it's
the first statement in the sequence)
a. The original, HALF_TYPE, is not enough - we replace the promotion
from HALF_TYPE to TYPE with a promotion to INTERM_TYPE.
b. HALF_TYPE is sufficient, OPRND is set as the RHS of the original
promotion.
2. OPRND is defined by a pattern statement we created.
a. Its type is not sufficient for the operation, we create a new stmt:
a type conversion for OPRND from HALF_TYPE to INTERM_TYPE. We store
this statement in NEW_DEF_STMT, and it is later put in
STMT_VINFO_PATTERN_DEF_SEQ of the pattern statement for STMT.
b. OPRND is good to use in the new statement. */
if (first)
{
if (interm_type)
{
/* Replace the original type conversion HALF_TYPE->TYPE with
HALF_TYPE->INTERM_TYPE. */
if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
{
new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
/* Check if the already created pattern stmt is what we need. */
if (!is_gimple_assign (new_stmt)
|| !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (new_stmt))
|| TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
return false;
If neither operand is single-use then operating on fewer than
N bits might lead to more extensions overall. Whether it does
or not depends on global information about the vectorization
region, and whether that's a good trade-off would again
depend on the complexity and cost of the statements involved,
as well as things like register pressure that are not normally
modelled at this stage. We therefore ignore these cases
and just optimize the clear single-use wins above.
stmts->safe_push (def_stmt);
oprnd = gimple_assign_lhs (new_stmt);
}
else
{
/* Create NEW_OPRND = (INTERM_TYPE) OPRND. */
oprnd = gimple_assign_rhs1 (def_stmt);
new_oprnd = make_ssa_name (interm_type);
new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
stmts->safe_push (def_stmt);
oprnd = new_oprnd;
}
}
else
Thus we take the maximum precision of the unpromoted operands
and record whether any operand is single-use. */
if (unprom[i].dt == vect_internal_def)
{
/* Retrieve the operand before the type promotion. */
oprnd = gimple_assign_rhs1 (def_stmt);
}
min_precision = MAX (min_precision,
TYPE_PRECISION (unprom[i].type));
single_use_p |= op_single_use_p;
}
else
{
if (interm_type)
{
/* Create a type conversion HALF_TYPE->INTERM_TYPE. */
new_oprnd = make_ssa_name (interm_type);
new_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, oprnd);
oprnd = new_oprnd;
*new_def_stmt = new_stmt;
}
/* Otherwise, OPRND is already set. */
}
if (interm_type)
*new_type = interm_type;
else
*new_type = half_type;
/* Although the operation could be done in operation_precision, we have
to balance that against introducing extra truncations or extensions.
Calculate the minimum precision that can be handled efficiently.
*op0 = oprnd;
*op1 = fold_convert (*new_type, const_oprnd);
The loop above determined that the operation could be handled
efficiently in MIN_PRECISION if SINGLE_USE_P; this would shift an
extension from the inputs to the output without introducing more
instructions, and would reduce the number of instructions required
for STMT_INFO itself.
return true;
}
vect_determine_precisions has also determined that the result only
needs min_output_precision bits. Truncating by a factor of N times
requires a tree of N - 1 instructions, so if TYPE is N times wider
than min_output_precision, doing the operation in TYPE and truncating
the result requires N + (N - 1) = 2N - 1 instructions per output vector.
In contrast:
- truncating the input to a unary operation and doing the operation
in the new type requires at most N - 1 + 1 = N instructions per
output vector
/* Try to find a statement or a sequence of statements that can be performed
on a smaller type:
- doing the same for a binary operation requires at most
(N - 1) * 2 + 1 = 2N - 1 instructions per output vector
type x_t;
TYPE x_T, res0_T, res1_T;
loop:
S1 x_t = *p;
S2 x_T = (TYPE) x_t;
S3 res0_T = op (x_T, C0);
S4 res1_T = op (res0_T, C1);
S5 ... = () res1_T; - type demotion
where type 'TYPE' is at least double the size of type 'type', C0 and C1 are
constants.
Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
be 'type' or some intermediate type. For now, we expect S5 to be a type
demotion operation. We also check that S3 and S4 have only one use. */
static gimple *
vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
{
gimple *stmt = stmts->pop ();
gimple *pattern_stmt = NULL, *new_def_stmt, *prev_stmt = NULL,
*use_stmt = NULL;
tree op0, op1, vectype = NULL_TREE, use_lhs, use_type;
tree var = NULL_TREE, new_type = NULL_TREE, new_oprnd;
bool first;
tree type = NULL;
Both unary and binary operations require fewer instructions than
this if the operands were extended from a suitable truncated form.
Thus there is usually nothing to lose by doing operations in
min_output_precision bits, but there can be something to gain. */
if (!single_use_p)
min_precision = last_stmt_info->min_output_precision;
else
min_precision = MIN (min_precision, last_stmt_info->min_output_precision);
first = true;
while (1)
{
if (!vinfo_for_stmt (stmt)
|| STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt)))
/* Apply the minimum efficient precision we just calculated. */
if (new_precision < min_precision)
new_precision = min_precision;
if (new_precision >= TYPE_PRECISION (type))
return NULL;
new_def_stmt = NULL;
if (!vect_operation_fits_smaller_type (stmt, var, &new_type,
&op0, &op1, &new_def_stmt,
stmts))
{
if (first)
return NULL;
else
break;
}
vect_pattern_detected ("vect_recog_over_widening_pattern", last_stmt);
/* STMT can be performed on a smaller type. Check its uses. */
use_stmt = vect_single_imm_use (stmt);
if (!use_stmt || !is_gimple_assign (use_stmt))
*type_out = get_vectype_for_scalar_type (type);
if (!*type_out)
return NULL;
/* Create pattern statement for STMT. */
vectype = get_vectype_for_scalar_type (new_type);
if (!vectype)
/* We've found a viable pattern. Get the new type of the operation. */
bool unsigned_p = (last_stmt_info->operation_sign == UNSIGNED);
tree new_type = build_nonstandard_integer_type (new_precision, unsigned_p);
/* We specifically don't check here whether the target supports the
new operation, since it might be something that a later pattern
wants to rewrite anyway. If targets have a minimum element size
for some optabs, we should pattern-match smaller ops to larger ops
where beneficial. */
tree new_vectype = get_vectype_for_scalar_type (new_type);
if (!new_vectype)
return NULL;
/* We want to collect all the statements for which we create pattern
statetments, except for the case when the last statement in the
sequence doesn't have a corresponding pattern statement. In such
case we associate the last pattern statement with the last statement
in the sequence. Therefore, we only add the original statement to
the list if we know that it is not the last. */
if (prev_stmt)
stmts->safe_push (prev_stmt);
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "demoting ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
dump_printf (MSG_NOTE, " to ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, new_type);
dump_printf (MSG_NOTE, "\n");
}
var = vect_recog_temp_ssa_var (new_type, NULL);
pattern_stmt
= gimple_build_assign (var, gimple_assign_rhs_code (stmt), op0, op1);
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
new_pattern_def_seq (vinfo_for_stmt (stmt), new_def_stmt);
/* Calculate the rhs operands for an operation on NEW_TYPE. */
STMT_VINFO_PATTERN_DEF_SEQ (last_stmt_info) = NULL;
tree ops[3] = {};
for (unsigned int i = 1; i < first_op; ++i)
ops[i - 1] = gimple_op (last_stmt, i);
vect_convert_inputs (last_stmt_info, nops, &ops[first_op - 1],
new_type, &unprom[0], new_vectype);
/* Use the operation to produce a result of type NEW_TYPE. */
tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
gimple *pattern_stmt = gimple_build_assign (new_var, code,
ops[0], ops[1], ops[2]);
gimple_set_location (pattern_stmt, gimple_location (last_stmt));
if (dump_enabled_p ())
{
......@@ -1666,68 +1611,88 @@ vect_recog_over_widening_pattern (vec<gimple *> *stmts, tree *type_out)
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0);
}
type = gimple_expr_type (stmt);
prev_stmt = stmt;
stmt = use_stmt;
pattern_stmt = vect_convert_output (last_stmt_info, type,
pattern_stmt, new_vectype);
first = false;
}
stmts->safe_push (last_stmt);
return pattern_stmt;
}
/* We got a sequence. We expect it to end with a type demotion operation.
Otherwise, we quit (for now). There are three possible cases: the
conversion is to NEW_TYPE (we don't do anything), the conversion is to
a type bigger than NEW_TYPE and/or the signedness of USE_TYPE and
NEW_TYPE differs (we create a new conversion statement). */
if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
{
use_lhs = gimple_assign_lhs (use_stmt);
use_type = TREE_TYPE (use_lhs);
/* Support only type demotion or signedess change. */
if (!INTEGRAL_TYPE_P (use_type)
|| TYPE_PRECISION (type) <= TYPE_PRECISION (use_type))
/* Recognize cases in which the input to a cast is wider than its
output, and the input is fed by a widening operation. Fold this
by removing the unnecessary intermediate widening. E.g.:
unsigned char a;
unsigned int b = (unsigned int) a;
unsigned short c = (unsigned short) b;
-->
unsigned short c = (unsigned short) a;
Although this is rare in input IR, it is an expected side-effect
of the over-widening pattern above.
This is beneficial also for integer-to-float conversions, if the
widened integer has more bits than the float, and if the unwidened
input doesn't. */
static gimple *
vect_recog_cast_forwprop_pattern (vec<gimple *> *stmts, tree *type_out)
{
/* Check for a cast, including an integer-to-float conversion. */
gassign *last_stmt = dyn_cast <gassign *> (stmts->pop ());
if (!last_stmt)
return NULL;
tree_code code = gimple_assign_rhs_code (last_stmt);
if (!CONVERT_EXPR_CODE_P (code) && code != FLOAT_EXPR)
return NULL;
/* Check that NEW_TYPE is not bigger than the conversion result. */
if (TYPE_PRECISION (new_type) > TYPE_PRECISION (use_type))
/* Make sure that the rhs is a scalar with a natural bitsize. */
tree lhs = gimple_assign_lhs (last_stmt);
if (!lhs)
return NULL;
tree lhs_type = TREE_TYPE (lhs);
scalar_mode lhs_mode;
if (VECT_SCALAR_BOOLEAN_TYPE_P (lhs_type)
|| !is_a <scalar_mode> (TYPE_MODE (lhs_type), &lhs_mode))
return NULL;
if (TYPE_UNSIGNED (new_type) != TYPE_UNSIGNED (use_type)
|| TYPE_PRECISION (new_type) != TYPE_PRECISION (use_type))
{
*type_out = get_vectype_for_scalar_type (use_type);
if (!*type_out)
/* Check for a narrowing operation (from a vector point of view). */
tree rhs = gimple_assign_rhs1 (last_stmt);
tree rhs_type = TREE_TYPE (rhs);
if (!INTEGRAL_TYPE_P (rhs_type)
|| VECT_SCALAR_BOOLEAN_TYPE_P (rhs_type)
|| TYPE_PRECISION (rhs_type) <= GET_MODE_BITSIZE (lhs_mode))
return NULL;
/* Create NEW_TYPE->USE_TYPE conversion. */
new_oprnd = make_ssa_name (use_type);
pattern_stmt = gimple_build_assign (new_oprnd, NOP_EXPR, var);
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
/* Try to find an unpromoted input. */
stmt_vec_info last_stmt_info = vinfo_for_stmt (last_stmt);
vec_info *vinfo = last_stmt_info->vinfo;
vect_unpromoted_value unprom;
if (!vect_look_through_possible_promotion (vinfo, rhs, &unprom)
|| TYPE_PRECISION (unprom.type) >= TYPE_PRECISION (rhs_type))
return NULL;
/* We created a pattern statement for the last statement in the
sequence, so we don't need to associate it with the pattern
statement created for PREV_STMT. Therefore, we add PREV_STMT
to the list in order to mark it later in vect_pattern_recog_1. */
if (prev_stmt)
stmts->safe_push (prev_stmt);
}
else
{
if (prev_stmt)
STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (use_stmt))
= STMT_VINFO_PATTERN_DEF_SEQ (vinfo_for_stmt (prev_stmt));
/* If the bits above RHS_TYPE matter, make sure that they're the
same when extending from UNPROM as they are when extending from RHS. */
if (!INTEGRAL_TYPE_P (lhs_type)
&& TYPE_SIGN (rhs_type) != TYPE_SIGN (unprom.type))
return NULL;
*type_out = vectype;
}
/* We can get the same result by casting UNPROM directly, to avoid
the unnecessary widening and narrowing. */
vect_pattern_detected ("vect_recog_cast_forwprop_pattern", last_stmt);
stmts->safe_push (use_stmt);
}
else
/* TODO: support general case, create a conversion to the correct type. */
*type_out = get_vectype_for_scalar_type (lhs_type);
if (!*type_out)
return NULL;
/* Pattern detected. */
vect_pattern_detected ("vect_recog_over_widening_pattern", stmts->last ());
tree new_var = vect_recog_temp_ssa_var (lhs_type, NULL);
gimple *pattern_stmt = gimple_build_assign (new_var, code, unprom.op);
gimple_set_location (pattern_stmt, gimple_location (last_stmt));
stmts->safe_push (last_stmt);
return pattern_stmt;
}
......@@ -4205,6 +4170,390 @@ vect_recog_gather_scatter_pattern (vec<gimple *> *stmts, tree *type_out)
return pattern_stmt;
}
/* Return true if TYPE is a non-boolean integer type. These are the types
that we want to consider for narrowing. */
static bool
vect_narrowable_type_p (tree type)
{
return INTEGRAL_TYPE_P (type) && !VECT_SCALAR_BOOLEAN_TYPE_P (type);
}
/* Return true if the operation given by CODE can be truncated to N bits
when only N bits of the output are needed. This is only true if bit N+1
of the inputs has no effect on the low N bits of the result. */
static bool
vect_truncatable_operation_p (tree_code code)
{
switch (code)
{
case PLUS_EXPR:
case MINUS_EXPR:
case MULT_EXPR:
case BIT_AND_EXPR:
case BIT_IOR_EXPR:
case BIT_XOR_EXPR:
case COND_EXPR:
return true;
default:
return false;
}
}
/* Record that STMT_INFO could be changed from operating on TYPE to
operating on a type with the precision and sign given by PRECISION
and SIGN respectively. PRECISION is an arbitrary bit precision;
it might not be a whole number of bytes. */
static void
vect_set_operation_type (stmt_vec_info stmt_info, tree type,
unsigned int precision, signop sign)
{
/* Round the precision up to a whole number of bytes. */
precision = vect_element_precision (precision);
if (precision < TYPE_PRECISION (type)
&& (!stmt_info->operation_precision
|| stmt_info->operation_precision > precision))
{
stmt_info->operation_precision = precision;
stmt_info->operation_sign = sign;
}
}
/* Record that STMT_INFO only requires MIN_INPUT_PRECISION from its
non-boolean inputs, all of which have type TYPE. MIN_INPUT_PRECISION
is an arbitrary bit precision; it might not be a whole number of bytes. */
static void
vect_set_min_input_precision (stmt_vec_info stmt_info, tree type,
unsigned int min_input_precision)
{
/* This operation in isolation only requires the inputs to have
MIN_INPUT_PRECISION of precision, However, that doesn't mean
that MIN_INPUT_PRECISION is a natural precision for the chain
as a whole. E.g. consider something like:
unsigned short *x, *y;
*y = ((*x & 0xf0) >> 4) | (*y << 4);
The right shift can be done on unsigned chars, and only requires the
result of "*x & 0xf0" to be done on unsigned chars. But taking that
approach would mean turning a natural chain of single-vector unsigned
short operations into one that truncates "*x" and then extends
"(*x & 0xf0) >> 4", with two vectors for each unsigned short
operation and one vector for each unsigned char operation.
This would be a significant pessimization.
Instead only propagate the maximum of this precision and the precision
required by the users of the result. This means that we don't pessimize
the case above but continue to optimize things like:
unsigned char *y;
unsigned short *x;
*y = ((*x & 0xf0) >> 4) | (*y << 4);
Here we would truncate two vectors of *x to a single vector of
unsigned chars and use single-vector unsigned char operations for
everything else, rather than doing two unsigned short copies of
"(*x & 0xf0) >> 4" and then truncating the result. */
min_input_precision = MAX (min_input_precision,
stmt_info->min_output_precision);
if (min_input_precision < TYPE_PRECISION (type)
&& (!stmt_info->min_input_precision
|| stmt_info->min_input_precision > min_input_precision))
stmt_info->min_input_precision = min_input_precision;
}
/* Subroutine of vect_determine_min_output_precision. Return true if
we can calculate a reduced number of output bits for STMT_INFO,
whose result is LHS. */
static bool
vect_determine_min_output_precision_1 (stmt_vec_info stmt_info, tree lhs)
{
/* Take the maximum precision required by users of the result. */
unsigned int precision = 0;
imm_use_iterator iter;
use_operand_p use;
FOR_EACH_IMM_USE_FAST (use, iter, lhs)
{
gimple *use_stmt = USE_STMT (use);
if (is_gimple_debug (use_stmt))
continue;
if (!vect_stmt_in_region_p (stmt_info->vinfo, use_stmt))
return false;
stmt_vec_info use_stmt_info = vinfo_for_stmt (use_stmt);
if (!use_stmt_info->min_input_precision)
return false;
precision = MAX (precision, use_stmt_info->min_input_precision);
}
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "only the low %d bits of ",
precision);
dump_generic_expr (MSG_NOTE, TDF_SLIM, lhs);
dump_printf (MSG_NOTE, " are significant\n");
}
stmt_info->min_output_precision = precision;
return true;
}
/* Calculate min_output_precision for STMT_INFO. */
static void
vect_determine_min_output_precision (stmt_vec_info stmt_info)
{
/* We're only interested in statements with a narrowable result. */
tree lhs = gimple_get_lhs (stmt_info->stmt);
if (!lhs
|| TREE_CODE (lhs) != SSA_NAME
|| !vect_narrowable_type_p (TREE_TYPE (lhs)))
return;
if (!vect_determine_min_output_precision_1 (stmt_info, lhs))
stmt_info->min_output_precision = TYPE_PRECISION (TREE_TYPE (lhs));
}
/* Use range information to decide whether STMT (described by STMT_INFO)
could be done in a narrower type. This is effectively a forward
propagation, since it uses context-independent information that applies
to all users of an SSA name. */
static void
vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt)
{
tree lhs = gimple_assign_lhs (stmt);
if (!lhs || TREE_CODE (lhs) != SSA_NAME)
return;
tree type = TREE_TYPE (lhs);
if (!vect_narrowable_type_p (type))
return;
/* First see whether we have any useful range information for the result. */
unsigned int precision = TYPE_PRECISION (type);
signop sign = TYPE_SIGN (type);
wide_int min_value, max_value;
if (!vect_get_range_info (lhs, &min_value, &max_value))
return;
tree_code code = gimple_assign_rhs_code (stmt);
unsigned int nops = gimple_num_ops (stmt);
if (!vect_truncatable_operation_p (code))
/* Check that all relevant input operands are compatible, and update
[MIN_VALUE, MAX_VALUE] to include their ranges. */
for (unsigned int i = 1; i < nops; ++i)
{
tree op = gimple_op (stmt, i);
if (TREE_CODE (op) == INTEGER_CST)
{
/* Don't require the integer to have RHS_TYPE (which it might
not for things like shift amounts, etc.), but do require it
to fit the type. */
if (!int_fits_type_p (op, type))
return;
min_value = wi::min (min_value, wi::to_wide (op, precision), sign);
max_value = wi::max (max_value, wi::to_wide (op, precision), sign);
}
else if (TREE_CODE (op) == SSA_NAME)
{
/* Ignore codes that don't take uniform arguments. */
if (!types_compatible_p (TREE_TYPE (op), type))
return;
wide_int op_min_value, op_max_value;
if (!vect_get_range_info (op, &op_min_value, &op_max_value))
return;
min_value = wi::min (min_value, op_min_value, sign);
max_value = wi::max (max_value, op_max_value, sign);
}
else
return;
}
/* Try to switch signed types for unsigned types if we can.
This is better for two reasons. First, unsigned ops tend
to be cheaper than signed ops. Second, it means that we can
handle things like:
signed char c;
int res = (int) c & 0xff00; // range [0x0000, 0xff00]
as:
signed char c;
unsigned short res_1 = (unsigned short) c & 0xff00;
int res = (int) res_1;
where the intermediate result res_1 has unsigned rather than
signed type. */
if (sign == SIGNED && !wi::neg_p (min_value))
sign = UNSIGNED;
/* See what precision is required for MIN_VALUE and MAX_VALUE. */
unsigned int precision1 = wi::min_precision (min_value, sign);
unsigned int precision2 = wi::min_precision (max_value, sign);
unsigned int value_precision = MAX (precision1, precision2);
if (value_precision >= precision)
return;
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
" without loss of precision: ",
sign == SIGNED ? "signed" : "unsigned",
value_precision);
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
}
vect_set_operation_type (stmt_info, type, value_precision, sign);
vect_set_min_input_precision (stmt_info, type, value_precision);
}
/* Use information about the users of STMT's result to decide whether
STMT (described by STMT_INFO) could be done in a narrower type.
This is effectively a backward propagation. */
static void
vect_determine_precisions_from_users (stmt_vec_info stmt_info, gassign *stmt)
{
tree_code code = gimple_assign_rhs_code (stmt);
unsigned int opno = (code == COND_EXPR ? 2 : 1);
tree type = TREE_TYPE (gimple_op (stmt, opno));
if (!vect_narrowable_type_p (type))
return;
unsigned int precision = TYPE_PRECISION (type);
unsigned int operation_precision, min_input_precision;
switch (code)
{
CASE_CONVERT:
/* Only the bits that contribute to the output matter. Don't change
the precision of the operation itself. */
operation_precision = precision;
min_input_precision = stmt_info->min_output_precision;
break;
case LSHIFT_EXPR:
case RSHIFT_EXPR:
{
tree shift = gimple_assign_rhs2 (stmt);
if (TREE_CODE (shift) != INTEGER_CST
|| !wi::ltu_p (wi::to_widest (shift), precision))
return;
unsigned int const_shift = TREE_INT_CST_LOW (shift);
if (code == LSHIFT_EXPR)
{
/* We need CONST_SHIFT fewer bits of the input. */
operation_precision = stmt_info->min_output_precision;
min_input_precision = (MAX (operation_precision, const_shift)
- const_shift);
}
else
{
/* We need CONST_SHIFT extra bits to do the operation. */
operation_precision = (stmt_info->min_output_precision
+ const_shift);
min_input_precision = operation_precision;
}
break;
}
default:
if (vect_truncatable_operation_p (code))
{
/* Input bit N has no effect on output bits N-1 and lower. */
operation_precision = stmt_info->min_output_precision;
min_input_precision = operation_precision;
break;
}
return;
}
if (operation_precision < precision)
{
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "can narrow to %s:%d"
" without affecting users: ",
TYPE_UNSIGNED (type) ? "unsigned" : "signed",
operation_precision);
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
}
vect_set_operation_type (stmt_info, type, operation_precision,
TYPE_SIGN (type));
}
vect_set_min_input_precision (stmt_info, type, min_input_precision);
}
/* Handle vect_determine_precisions for STMT_INFO, given that we
have already done so for the users of its result. */
void
vect_determine_stmt_precisions (stmt_vec_info stmt_info)
{
vect_determine_min_output_precision (stmt_info);
if (gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt))
{
vect_determine_precisions_from_range (stmt_info, stmt);
vect_determine_precisions_from_users (stmt_info, stmt);
}
}
/* Walk backwards through the vectorizable region to determine the
values of these fields:
- min_output_precision
- min_input_precision
- operation_precision
- operation_sign. */
void
vect_determine_precisions (vec_info *vinfo)
{
DUMP_VECT_SCOPE ("vect_determine_precisions");
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
unsigned int nbbs = loop->num_nodes;
for (unsigned int i = 0; i < nbbs; i++)
{
basic_block bb = bbs[nbbs - i - 1];
for (gimple_stmt_iterator si = gsi_last_bb (bb);
!gsi_end_p (si); gsi_prev (&si))
vect_determine_stmt_precisions (vinfo_for_stmt (gsi_stmt (si)));
}
}
else
{
bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
gimple_stmt_iterator si = bb_vinfo->region_end;
gimple *stmt;
do
{
if (!gsi_stmt (si))
si = gsi_last_bb (bb_vinfo->bb);
else
gsi_prev (&si);
stmt = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
vect_determine_stmt_precisions (stmt_info);
}
while (stmt != gsi_stmt (bb_vinfo->region_begin));
}
}
typedef gimple *(*vect_recog_func_ptr) (vec<gimple *> *, tree *);
struct vect_recog_func
......@@ -4217,13 +4566,14 @@ struct vect_recog_func
taken which means usually the more complex one needs to preceed the
less comples onex (widen_sum only after dot_prod or sad for example). */
static vect_recog_func vect_vect_recog_func_ptrs[] = {
{ vect_recog_over_widening_pattern, "over_widening" },
{ vect_recog_cast_forwprop_pattern, "cast_forwprop" },
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_over_widening_pattern, "over_widening" },
{ vect_recog_rotate_pattern, "rotate" },
{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
{ vect_recog_divmod_pattern, "divmod" },
......@@ -4502,6 +4852,8 @@ vect_pattern_recog (vec_info *vinfo)
unsigned int i, j;
auto_vec<gimple *, 1> stmts_to_replace;
vect_determine_precisions (vinfo);
DUMP_VECT_SCOPE ("vect_pattern_recog");
if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
......
......@@ -899,6 +899,21 @@ typedef struct _stmt_vec_info {
/* The number of scalar stmt references from active SLP instances. */
unsigned int num_slp_uses;
/* If nonzero, the lhs of the statement could be truncated to this
many bits without affecting any users of the result. */
unsigned int min_output_precision;
/* If nonzero, all non-boolean input operands have the same precision,
and they could each be truncated to this many bits without changing
the result. */
unsigned int min_input_precision;
/* If OPERATION_BITS is nonzero, the statement could be performed on
an integer with the sign and number of bits given by OPERATION_SIGN
and OPERATION_BITS without changing the result. */
unsigned int operation_precision;
signop operation_sign;
} *stmt_vec_info;
/* Information about a gather/scatter call. */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment