Commit 87a80d27 by Richard Sandiford Committed by Richard Sandiford

[AArch64] Pattern-match SVE extending gather loads

This patch pattern-matches a partial gather load followed by a sign or
zero extension into an extending gather load.  (The partial gather load
is already an extending load; we just don't rely on the upper bits of
the elements.)

2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/iterators.md (SVE_2BHSI, SVE_2HSDI, SVE_4BHI)
	(SVE_4HSI): New mode iterators.
	(ANY_EXTEND2): New code iterator.
	* config/aarch64/aarch64-sve.md
	(@aarch64_gather_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>):
	Extend to...
	(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
	...this, handling extension to partial modes as well as full modes.
	Describe the extension as a predicated rather than unpredicated
	extension.
	(@aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>):
	Likewise extend to...
	(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
	...this, making the same adjustments.
	(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
	Likewise extend to...
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_sxtw)
	...this, making the same adjustments.
	(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
	Likewise extend to...
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_uxtw)
	...this, making the same adjustments.
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked):
	New pattern.
	(*aarch64_ldff1_gather<mode>_sxtw): Canonicalize to a constant
	extension predicate.
	(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
	(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
	(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
	Describe the extension as a predicated rather than unpredicated
	extension.
	(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
	Likewise.  Canonicalize to a constant extension predicate.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svld1_gather_extend_impl::expand): Add an extra predicate for
	the extension.
	(svldff1_gather_extend_impl::expand): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/sve/gather_load_extend_1.c: New test.
	* gcc.target/aarch64/sve/gather_load_extend_2.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_3.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_4.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_5.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_6.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_7.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_8.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_9.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_10.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_11.c: Likewise.
	* gcc.target/aarch64/sve/gather_load_extend_12.c: Likewise.

From-SVN: r278346
parent f8186eea
2019-11-16 Richard Sandiford <richard.sandiford@arm.com> 2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* config/aarch64/iterators.md (SVE_2BHSI, SVE_2HSDI, SVE_4BHI)
(SVE_4HSI): New mode iterators.
(ANY_EXTEND2): New code iterator.
* config/aarch64/aarch64-sve.md
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>):
Extend to...
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
...this, handling extension to partial modes as well as full modes.
Describe the extension as a predicated rather than unpredicated
extension.
(@aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>):
Likewise extend to...
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
Likewise extend to...
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_sxtw)
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
Likewise extend to...
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_uxtw)
...this, making the same adjustments.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked):
New pattern.
(*aarch64_ldff1_gather<mode>_sxtw): Canonicalize to a constant
extension predicate.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode><VNx4_NARROW:mode>)
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>)
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_uxtw):
Describe the extension as a predicated rather than unpredicated
extension.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode><VNx2_NARROW:mode>_sxtw):
Likewise. Canonicalize to a constant extension predicate.
* config/aarch64/aarch64-sve-builtins-base.cc
(svld1_gather_extend_impl::expand): Add an extra predicate for
the extension.
(svldff1_gather_extend_impl::expand): Likewise.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* config/aarch64/iterators.md (SVE_24, SVE_2, SVE_4): New mode * config/aarch64/iterators.md (SVE_24, SVE_2, SVE_4): New mode
iterators. iterators.
* config/aarch64/aarch64-sve.md * config/aarch64/aarch64-sve.md
...@@ -1097,6 +1097,8 @@ public: ...@@ -1097,6 +1097,8 @@ public:
/* Put the predicate last, since the extending gathers use the same /* Put the predicate last, since the extending gathers use the same
operand order as mask_gather_load_optab. */ operand order as mask_gather_load_optab. */
e.rotate_inputs_left (0, 5); e.rotate_inputs_left (0, 5);
/* Add a constant predicate for the extension rtx. */
e.args.quick_push (CONSTM1_RTX (VNx16BImode));
insn_code icode = code_for_aarch64_gather_load (extend_rtx_code (), insn_code icode = code_for_aarch64_gather_load (extend_rtx_code (),
e.vector_mode (0), e.vector_mode (0),
e.memory_vector_mode ()); e.memory_vector_mode ());
...@@ -1234,6 +1236,8 @@ public: ...@@ -1234,6 +1236,8 @@ public:
/* Put the predicate last, since ldff1_gather uses the same operand /* Put the predicate last, since ldff1_gather uses the same operand
order as mask_gather_load_optab. */ order as mask_gather_load_optab. */
e.rotate_inputs_left (0, 5); e.rotate_inputs_left (0, 5);
/* Add a constant predicate for the extension rtx. */
e.args.quick_push (CONSTM1_RTX (VNx16BImode));
insn_code icode = code_for_aarch64_ldff1_gather (extend_rtx_code (), insn_code icode = code_for_aarch64_ldff1_gather (extend_rtx_code (),
e.vector_mode (0), e.vector_mode (0),
e.memory_vector_mode ()); e.memory_vector_mode ());
......
...@@ -371,9 +371,21 @@ ...@@ -371,9 +371,21 @@
;; SVE modes with 2 elements. ;; SVE modes with 2 elements.
(define_mode_iterator SVE_2 [VNx2QI VNx2HI VNx2HF VNx2SI VNx2SF VNx2DI VNx2DF]) (define_mode_iterator SVE_2 [VNx2QI VNx2HI VNx2HF VNx2SI VNx2SF VNx2DI VNx2DF])
;; SVE integer modes with 2 elements, excluding the widest element.
(define_mode_iterator SVE_2BHSI [VNx2QI VNx2HI VNx2SI])
;; SVE integer modes with 2 elements, excluding the narrowest element.
(define_mode_iterator SVE_2HSDI [VNx2HI VNx2SI VNx2DI])
;; SVE modes with 4 elements. ;; SVE modes with 4 elements.
(define_mode_iterator SVE_4 [VNx4QI VNx4HI VNx4HF VNx4SI VNx4SF]) (define_mode_iterator SVE_4 [VNx4QI VNx4HI VNx4HF VNx4SI VNx4SF])
;; SVE integer modes with 4 elements, excluding the widest element.
(define_mode_iterator SVE_4BHI [VNx4QI VNx4HI])
;; SVE integer modes with 4 elements, excluding the narrowest element.
(define_mode_iterator SVE_4HSI [VNx4HI VNx4SI])
;; Modes involved in extending or truncating SVE data, for 8 elements per ;; Modes involved in extending or truncating SVE data, for 8 elements per
;; 128-bit block. ;; 128-bit block.
(define_mode_iterator VNx8_NARROW [VNx8QI]) (define_mode_iterator VNx8_NARROW [VNx8QI])
...@@ -1443,6 +1455,7 @@ ...@@ -1443,6 +1455,7 @@
;; Code iterator for sign/zero extension ;; Code iterator for sign/zero extension
(define_code_iterator ANY_EXTEND [sign_extend zero_extend]) (define_code_iterator ANY_EXTEND [sign_extend zero_extend])
(define_code_iterator ANY_EXTEND2 [sign_extend zero_extend])
;; All division operations (signed/unsigned) ;; All division operations (signed/unsigned)
(define_code_iterator ANY_DIV [div udiv]) (define_code_iterator ANY_DIV [div udiv])
......
2019-11-16 Richard Sandiford <richard.sandiford@arm.com> 2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/gather_load_extend_1.c: New test.
* gcc.target/aarch64/sve/gather_load_extend_2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_3.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_4.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_5.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_7.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_8.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_9.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_10.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_11.c: Likewise.
* gcc.target/aarch64/sve/gather_load_extend_12.c: Likewise.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/gather_load_1.c (TEST_LOOP): Start at 0. * gcc.target/aarch64/sve/gather_load_1.c (TEST_LOOP): Start at 0.
(TEST_ALL): Add tests for 8-bit and 16-bit elements. (TEST_ALL): Add tests for 8-bit and 16-bit elements.
* gcc.target/aarch64/sve/gather_load_2.c: Update accordingly. * gcc.target/aarch64/sve/gather_load_2.c: Update accordingly.
......
/* { dg-options "-O2 -ftree-vectorize" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \
TYPE2 *restrict src2, uint32_t *restrict index, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
dst[i] += src1[i] + src2[index[i]]; \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint8_t) \
T (uint64_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint64_t, uint16_t) \
T (uint64_t, uint32_t)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
/* { dg-final { scan-assembler-not {\tuxt.\t} } } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
#include <stdint.h>
void
f1 (int64_t *restrict dst, int16_t *src1, int8_t *src2, uint32_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int16_t) (src1[i] + src2[index[i]]);
}
void
f2 (int64_t *restrict dst, int16_t *src1, int8_t *src2, uint64_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int16_t) (src1[i] + src2[index[i]]);
}
void
f3 (int64_t *restrict dst, int16_t *src1, int8_t **src2)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int16_t) (src1[i] + *src2[i]);
}
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */
/* { dg-final { scan-assembler-times {\tsxth\tz[0-9]+\.d,} 3 } } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
#include <stdint.h>
void
f1 (int64_t *restrict dst, int32_t *src1, int8_t *src2, uint32_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int32_t) (src1[i] + src2[index[i]]);
}
void
f2 (int64_t *restrict dst, int32_t *src1, int8_t *src2, uint64_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int32_t) (src1[i] + src2[index[i]]);
}
void
f3 (int64_t *restrict dst, int32_t *src1, int8_t **src2)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int32_t) (src1[i] + *src2[i]);
}
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */
/* { dg-final { scan-assembler-times {\tsxtw\tz[0-9]+\.d,} 3 } } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
#include <stdint.h>
void
f1 (int64_t *restrict dst, int32_t *src1, int16_t *src2, uint32_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int32_t) (src1[i] + src2[index[i]]);
}
void
f2 (int64_t *restrict dst, int32_t *src1, int16_t *src2, uint64_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int32_t) (src1[i] + src2[index[i]]);
}
void
f3 (int64_t *restrict dst, int32_t *src1, int16_t **src2)
{
for (int i = 0; i < 7; ++i)
dst[i] += (int32_t) (src1[i] + *src2[i]);
}
/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d, lsl 1\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */
/* { dg-final { scan-assembler-times {\tsxtw\tz[0-9]+\.d,} 3 } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \
TYPE2 *restrict src2, uint32_t *restrict index, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
dst[i] += src1[i] + src2[index[i]]; \
}
#define TEST_ALL(T) \
T (int16_t, int8_t) \
T (int32_t, int8_t) \
T (int64_t, int8_t) \
T (int32_t, int16_t) \
T (int64_t, int16_t) \
T (int64_t, int32_t)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
/* { dg-final { scan-assembler-not {\tsxt.\t} } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \
TYPE2 *restrict src2, int32_t *restrict index, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
dst[i] += src1[i] + src2[index[i]]; \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint8_t) \
T (uint64_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint64_t, uint16_t) \
T (uint64_t, uint32_t)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
/* { dg-final { scan-assembler-not {\tuxt.\t} } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \
TYPE2 *restrict src2, int32_t *restrict index, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
dst[i] += src1[i] + src2[index[i]]; \
}
#define TEST_ALL(T) \
T (int16_t, int8_t) \
T (int32_t, int8_t) \
T (int64_t, int8_t) \
T (int32_t, int16_t) \
T (int64_t, int16_t) \
T (int64_t, int32_t)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */
/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */
/* { dg-final { scan-assembler-not {\tsxt.\t} } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \
TYPE2 *restrict src2, uint64_t *restrict index, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
dst[i] += src1[i] + src2[index[i]]; \
}
#define TEST_ALL(T) \
T (uint16_t, uint8_t) \
T (uint32_t, uint8_t) \
T (uint64_t, uint8_t) \
T (uint32_t, uint16_t) \
T (uint64_t, uint16_t) \
T (uint64_t, uint32_t)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
/* { dg-final { scan-assembler-not {\tuxt.\t} } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include <stdint.h>
#define TEST_LOOP(TYPE1, TYPE2) \
void \
f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \
TYPE2 *restrict src2, uint64_t *restrict index, \
int n) \
{ \
for (int i = 0; i < n; ++i) \
dst[i] += src1[i] + src2[index[i]]; \
}
#define TEST_ALL(T) \
T (int16_t, int8_t) \
T (int32_t, int8_t) \
T (int64_t, int8_t) \
T (int32_t, int16_t) \
T (int64_t, int16_t) \
T (int64_t, int32_t)
TEST_ALL (TEST_LOOP)
/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */
/* { dg-final { scan-assembler-not {\tsxt.\t} } } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
#include <stdint.h>
void
f1 (uint64_t *restrict dst, uint16_t *src1, uint8_t *src2, uint32_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint16_t) (src1[i] + src2[index[i]]);
}
void
f2 (uint64_t *restrict dst, uint16_t *src1, uint8_t *src2, uint64_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint16_t) (src1[i] + src2[index[i]]);
}
void
f3 (uint64_t *restrict dst, uint16_t *src1, uint8_t **src2)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint16_t) (src1[i] + *src2[i]);
}
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */
/* { dg-final { scan-assembler-times {\tuxth\tz[0-9]+\.d,} 3 } } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
#include <stdint.h>
void
f1 (uint64_t *restrict dst, uint32_t *src1, uint8_t *src2, uint32_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
}
void
f2 (uint64_t *restrict dst, uint32_t *src1, uint8_t *src2, uint64_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
}
void
f3 (uint64_t *restrict dst, uint32_t *src1, uint8_t **src2)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint32_t) (src1[i] + *src2[i]);
}
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */
/* { dg-final { scan-assembler-times {\tuxtw\tz[0-9]+\.d,} 3 } } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */
#include <stdint.h>
void
f1 (uint64_t *restrict dst, uint32_t *src1, uint16_t *src2, uint32_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
}
void
f2 (uint64_t *restrict dst, uint32_t *src1, uint16_t *src2, uint64_t *index)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint32_t) (src1[i] + src2[index[i]]);
}
void
f3 (uint64_t *restrict dst, uint32_t *src1, uint16_t **src2)
{
for (int i = 0; i < 7; ++i)
dst[i] += (uint32_t) (src1[i] + *src2[i]);
}
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d, lsl 1\]\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */
/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */
/* { dg-final { scan-assembler-times {\tuxtw\tz[0-9]+\.d,} 3 } } */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment