Commit 4aeb1ba7 by Richard Sandiford Committed by Richard Sandiford

[AArch64] Improve SVE constant moves

If there's no SVE instruction to load a given constant directly, this
patch instead tries to use an Advanced SIMD constant move and then
duplicates the constant to fill an SVE vector.  The main use of this
is to support constants in which each byte is in { 0, 0xff }.

Also, the patch prefers a simple integer move followed by a duplicate
over a load from memory, like we already do for Advanced SIMD.  This is
a useful option to have and would be easy to turn off via a tuning
parameter if necessary.

The patch also extends the handling of wide LD1Rs to big endian,
whereas previously we punted to a full LD1RQ.

2019-08-13  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* machmode.h (opt_mode::else_mode): New function.
	(opt_mode::else_blk): Use it.
	* config/aarch64/aarch64-protos.h (aarch64_vq_mode): Declare.
	(aarch64_full_sve_mode, aarch64_sve_ld1rq_operand_p): Likewise.
	(aarch64_gen_stepped_int_parallel): Likewise.
	(aarch64_stepped_int_parallel_p): Likewise.
	(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
	argument.
	* config/aarch64/aarch64.c
	(aarch64_expand_sve_widened_duplicate): Delete.
	(aarch64_expand_sve_dupq, aarch64_expand_sve_ld1rq): New functions.
	(aarch64_expand_sve_const_vector): Rewrite to handle more cases.
	(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
	argument.  Use early returns in the !CONST_INT_P handling.
	Pass all SVE data vectors to aarch64_expand_sve_const_vector rather
	than handling some inline.
	(aarch64_full_sve_mode, aarch64_vq_mode): New functions, split out
	from...
	(aarch64_simd_container_mode): ...here.
	(aarch64_gen_stepped_int_parallel, aarch64_stepped_int_parallel_p)
	(aarch64_sve_ld1rq_operand_p): New functions.
	* config/aarch64/predicates.md (descending_int_parallel)
	(aarch64_sve_ld1rq_operand): New predicates.
	* config/aarch64/constraints.md (UtQ): New constraint.
	* config/aarch64/aarch64.md (UNSPEC_REINTERPRET): New unspec.
	* config/aarch64/aarch64-sve.md (mov<SVE_ALL:mode>): Remove the
	gen_vec_duplicate from call to aarch64_expand_mov_immediate.
	(@aarch64_sve_reinterpret<mode>): New expander.
	(*aarch64_sve_reinterpret<mode>): New pattern.
	(@aarch64_vec_duplicate_vq<mode>_le): New pattern.
	(@aarch64_vec_duplicate_vq<mode>_be): Likewise.
	(*sve_ld1rq<Vesize>): Replace with...
	(@aarch64_sve_ld1rq<mode>): ...this new pattern.

gcc/testsuite/
	* gcc.target/aarch64/sve/init_2.c: Expect ld1rd to be used
	instead of a full vector load.
	* gcc.target/aarch64/sve/init_4.c: Likewise.
	* gcc.target/aarch64/sve/ld1r_2.c: Remove constants that no longer
	need to be loaded from memory.
	* gcc.target/aarch64/sve/slp_2.c: Expect the same output for
	big and little endian.
	* gcc.target/aarch64/sve/slp_3.c: Likewise.  Expect 3 of the
	doubles to be moved via integer registers rather than loaded
	from memory.
	* gcc.target/aarch64/sve/slp_4.c: Likewise but for 4 doubles.
	* gcc.target/aarch64/sve/spill_4.c: Expect 16-bit constants to be
	loaded via an integer register rather than from memory.
	* gcc.target/aarch64/sve/const_1.c: New test.
	* gcc.target/aarch64/sve/const_2.c: Likewise.
	* gcc.target/aarch64/sve/const_3.c: Likewise.

From-SVN: r274375
parent 4e55aefa
2019-08-13 Richard Sandiford <richard.sandiford@arm.com>
* machmode.h (opt_mode::else_mode): New function.
(opt_mode::else_blk): Use it.
* config/aarch64/aarch64-protos.h (aarch64_vq_mode): Declare.
(aarch64_full_sve_mode, aarch64_sve_ld1rq_operand_p): Likewise.
(aarch64_gen_stepped_int_parallel): Likewise.
(aarch64_stepped_int_parallel_p): Likewise.
(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
argument.
* config/aarch64/aarch64.c
(aarch64_expand_sve_widened_duplicate): Delete.
(aarch64_expand_sve_dupq, aarch64_expand_sve_ld1rq): New functions.
(aarch64_expand_sve_const_vector): Rewrite to handle more cases.
(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
argument. Use early returns in the !CONST_INT_P handling.
Pass all SVE data vectors to aarch64_expand_sve_const_vector rather
than handling some inline.
(aarch64_full_sve_mode, aarch64_vq_mode): New functions, split out
from...
(aarch64_simd_container_mode): ...here.
(aarch64_gen_stepped_int_parallel, aarch64_stepped_int_parallel_p)
(aarch64_sve_ld1rq_operand_p): New functions.
* config/aarch64/predicates.md (descending_int_parallel)
(aarch64_sve_ld1rq_operand): New predicates.
* config/aarch64/constraints.md (UtQ): New constraint.
* config/aarch64/aarch64.md (UNSPEC_REINTERPRET): New unspec.
* config/aarch64/aarch64-sve.md (mov<SVE_ALL:mode>): Remove the
gen_vec_duplicate from call to aarch64_expand_mov_immediate.
(@aarch64_sve_reinterpret<mode>): New expander.
(*aarch64_sve_reinterpret<mode>): New pattern.
(@aarch64_vec_duplicate_vq<mode>_le): New pattern.
(@aarch64_vec_duplicate_vq<mode>_be): Likewise.
(*sve_ld1rq<Vesize>): Replace with...
(@aarch64_sve_ld1rq<mode>): ...this new pattern.
2019-08-13 Wilco Dijkstra <wdijkstr@arm.com> 2019-08-13 Wilco Dijkstra <wdijkstr@arm.com>
* config/aarch64/aarch64.c (generic_tunings): Set function alignment to * config/aarch64/aarch64.c (generic_tunings): Set function alignment to
......
...@@ -416,6 +416,8 @@ unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); ...@@ -416,6 +416,8 @@ unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode); bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode);
int aarch64_branch_cost (bool, bool); int aarch64_branch_cost (bool, bool);
enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx); enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
opt_machine_mode aarch64_vq_mode (scalar_mode);
opt_machine_mode aarch64_full_sve_mode (scalar_mode);
bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode); bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
...@@ -504,9 +506,12 @@ rtx aarch64_return_addr (int, rtx); ...@@ -504,9 +506,12 @@ rtx aarch64_return_addr (int, rtx);
rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT); rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
bool aarch64_simd_mem_operand_p (rtx); bool aarch64_simd_mem_operand_p (rtx);
bool aarch64_sve_ld1r_operand_p (rtx); bool aarch64_sve_ld1r_operand_p (rtx);
bool aarch64_sve_ld1rq_operand_p (rtx);
bool aarch64_sve_ldr_operand_p (rtx); bool aarch64_sve_ldr_operand_p (rtx);
bool aarch64_sve_struct_memory_operand_p (rtx); bool aarch64_sve_struct_memory_operand_p (rtx);
rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool); rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
rtx aarch64_gen_stepped_int_parallel (unsigned int, int, int);
bool aarch64_stepped_int_parallel_p (rtx, int);
rtx aarch64_tls_get_addr (void); rtx aarch64_tls_get_addr (void);
tree aarch64_fold_builtin (tree, int, tree *, bool); tree aarch64_fold_builtin (tree, int, tree *, bool);
unsigned aarch64_dbx_register_number (unsigned); unsigned aarch64_dbx_register_number (unsigned);
...@@ -518,7 +523,7 @@ const char * aarch64_output_probe_stack_range (rtx, rtx); ...@@ -518,7 +523,7 @@ const char * aarch64_output_probe_stack_range (rtx, rtx);
const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx); const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx);
void aarch64_err_no_fpadvsimd (machine_mode); void aarch64_err_no_fpadvsimd (machine_mode);
void aarch64_expand_epilogue (bool); void aarch64_expand_epilogue (bool);
void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0); void aarch64_expand_mov_immediate (rtx, rtx);
rtx aarch64_ptrue_reg (machine_mode); rtx aarch64_ptrue_reg (machine_mode);
rtx aarch64_pfalse_reg (machine_mode); rtx aarch64_pfalse_reg (machine_mode);
void aarch64_emit_sve_pred_move (rtx, rtx, rtx); void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
......
...@@ -207,8 +207,7 @@ ...@@ -207,8 +207,7 @@
if (CONSTANT_P (operands[1])) if (CONSTANT_P (operands[1]))
{ {
aarch64_expand_mov_immediate (operands[0], operands[1], aarch64_expand_mov_immediate (operands[0], operands[1]);
gen_vec_duplicate<mode>);
DONE; DONE;
} }
...@@ -326,6 +325,39 @@ ...@@ -326,6 +325,39 @@
} }
) )
;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
;; This is equivalent to a subreg on little-endian targets but not for
;; big-endian; see the comment at the head of the file for details.
(define_expand "@aarch64_sve_reinterpret<mode>"
[(set (match_operand:SVE_ALL 0 "register_operand")
(unspec:SVE_ALL [(match_operand 1 "aarch64_any_register_operand")]
UNSPEC_REINTERPRET))]
"TARGET_SVE"
{
if (!BYTES_BIG_ENDIAN)
{
emit_move_insn (operands[0], gen_lowpart (<MODE>mode, operands[1]));
DONE;
}
}
)
;; A pattern for handling type punning on big-endian targets. We use a
;; special predicate for operand 1 to reduce the number of patterns.
(define_insn_and_split "*aarch64_sve_reinterpret<mode>"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w")
(unspec:SVE_ALL [(match_operand 1 "aarch64_any_register_operand" "0")]
UNSPEC_REINTERPRET))]
"TARGET_SVE"
"#"
"&& reload_completed"
[(set (match_dup 0) (match_dup 1))]
{
emit_note (NOTE_INSN_DELETED);
DONE;
}
)
;; ------------------------------------------------------------------------- ;; -------------------------------------------------------------------------
;; ---- Moves of multiple vectors ;; ---- Moves of multiple vectors
;; ------------------------------------------------------------------------- ;; -------------------------------------------------------------------------
...@@ -787,6 +819,39 @@ ...@@ -787,6 +819,39 @@
[(set_attr "length" "4,4,8")] [(set_attr "length" "4,4,8")]
) )
;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
(define_insn "@aarch64_vec_duplicate_vq<mode>_le"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w")
(vec_duplicate:SVE_ALL
(match_operand:<V128> 1 "register_operand" "w")))]
"TARGET_SVE && !BYTES_BIG_ENDIAN"
{
operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
return "dup\t%0.q, %1.q[0]";
}
)
;; Duplicate an Advanced SIMD vector to fill an SVE vector (BE version).
;; The SVE register layout puts memory lane N into (architectural)
;; register lane N, whereas the Advanced SIMD layout puts the memory
;; lsb into the register lsb. We therefore have to describe this in rtl
;; terms as a reverse of the V128 vector followed by a duplicate.
(define_insn "@aarch64_vec_duplicate_vq<mode>_be"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w")
(vec_duplicate:SVE_ALL
(vec_select:<V128>
(match_operand:<V128> 1 "register_operand" "w")
(match_operand 2 "descending_int_parallel"))))]
"TARGET_SVE
&& BYTES_BIG_ENDIAN
&& known_eq (INTVAL (XVECEXP (operands[2], 0, 0)),
GET_MODE_NUNITS (<V128>mode) - 1)"
{
operands[1] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
return "dup\t%0.q, %1.q[0]";
}
)
;; This is used for vec_duplicate<mode>s from memory, but can also ;; This is used for vec_duplicate<mode>s from memory, but can also
;; be used by combine to optimize selects of a a vec_duplicate<mode> ;; be used by combine to optimize selects of a a vec_duplicate<mode>
;; with zero. ;; with zero.
...@@ -802,17 +867,19 @@ ...@@ -802,17 +867,19 @@
"ld1r<Vesize>\t%0.<Vetype>, %1/z, %2" "ld1r<Vesize>\t%0.<Vetype>, %1/z, %2"
) )
;; Load 128 bits from memory and duplicate to fill a vector. Since there ;; Load 128 bits from memory under predicate control and duplicate to
;; are so few operations on 128-bit "elements", we don't define a VNx1TI ;; fill a vector.
;; and simply use vectors of bytes instead. (define_insn "@aarch64_sve_ld1rq<mode>"
(define_insn "*sve_ld1rq<Vesize>"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w") [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
(unspec:SVE_ALL (unspec:SVE_ALL
[(match_operand:<VPRED> 1 "register_operand" "Upl") [(match_operand:<VPRED> 2 "register_operand" "Upl")
(match_operand:TI 2 "aarch64_sve_ld1r_operand" "Uty")] (match_operand:<V128> 1 "aarch64_sve_ld1rq_operand" "UtQ")]
UNSPEC_LD1RQ))] UNSPEC_LD1RQ))]
"TARGET_SVE" "TARGET_SVE"
"ld1rq<Vesize>\t%0.<Vetype>, %1/z, %2" {
operands[1] = gen_rtx_MEM (<VEL>mode, XEXP (operands[1], 0));
return "ld1rq<Vesize>\t%0.<Vetype>, %2/z, %1";
}
) )
;; ------------------------------------------------------------------------- ;; -------------------------------------------------------------------------
......
...@@ -234,6 +234,7 @@ ...@@ -234,6 +234,7 @@
UNSPEC_CLASTB UNSPEC_CLASTB
UNSPEC_FADDA UNSPEC_FADDA
UNSPEC_REV_SUBREG UNSPEC_REV_SUBREG
UNSPEC_REINTERPRET
UNSPEC_SPECULATION_TRACKER UNSPEC_SPECULATION_TRACKER
UNSPEC_COPYSIGN UNSPEC_COPYSIGN
UNSPEC_TTEST ; Represent transaction test. UNSPEC_TTEST ; Represent transaction test.
......
...@@ -272,6 +272,12 @@ ...@@ -272,6 +272,12 @@
(match_test "aarch64_legitimate_address_p (V2DImode, (match_test "aarch64_legitimate_address_p (V2DImode,
XEXP (op, 0), 1)"))) XEXP (op, 0), 1)")))
(define_memory_constraint "UtQ"
"@internal
An address valid for SVE LD1RQs."
(and (match_code "mem")
(match_test "aarch64_sve_ld1rq_operand_p (op)")))
(define_memory_constraint "Uty" (define_memory_constraint "Uty"
"@internal "@internal
An address valid for SVE LD1Rs." An address valid for SVE LD1Rs."
......
...@@ -431,6 +431,12 @@ ...@@ -431,6 +431,12 @@
return aarch64_simd_check_vect_par_cnst_half (op, mode, false); return aarch64_simd_check_vect_par_cnst_half (op, mode, false);
}) })
(define_predicate "descending_int_parallel"
(match_code "parallel")
{
return aarch64_stepped_int_parallel_p (op, -1);
})
(define_special_predicate "aarch64_simd_lshift_imm" (define_special_predicate "aarch64_simd_lshift_imm"
(match_code "const,const_vector") (match_code "const,const_vector")
{ {
...@@ -543,6 +549,10 @@ ...@@ -543,6 +549,10 @@
(and (match_operand 0 "memory_operand") (and (match_operand 0 "memory_operand")
(match_test "aarch64_sve_ld1r_operand_p (op)"))) (match_test "aarch64_sve_ld1r_operand_p (op)")))
(define_predicate "aarch64_sve_ld1rq_operand"
(and (match_code "mem")
(match_test "aarch64_sve_ld1rq_operand_p (op)")))
;; Like memory_operand, but restricted to addresses that are valid for ;; Like memory_operand, but restricted to addresses that are valid for
;; SVE LDR and STR instructions. ;; SVE LDR and STR instructions.
(define_predicate "aarch64_sve_ldr_operand" (define_predicate "aarch64_sve_ldr_operand"
......
...@@ -251,7 +251,8 @@ public: ...@@ -251,7 +251,8 @@ public:
ALWAYS_INLINE opt_mode (from_int m) : m_mode (machine_mode (m)) {} ALWAYS_INLINE opt_mode (from_int m) : m_mode (machine_mode (m)) {}
machine_mode else_void () const; machine_mode else_void () const;
machine_mode else_blk () const; machine_mode else_blk () const { return else_mode (BLKmode); }
machine_mode else_mode (machine_mode) const;
T require () const; T require () const;
bool exists () const; bool exists () const;
...@@ -271,13 +272,13 @@ opt_mode<T>::else_void () const ...@@ -271,13 +272,13 @@ opt_mode<T>::else_void () const
return m_mode; return m_mode;
} }
/* If the T exists, return its enum value, otherwise return E_BLKmode. */ /* If the T exists, return its enum value, otherwise return FALLBACK. */
template<typename T> template<typename T>
inline machine_mode inline machine_mode
opt_mode<T>::else_blk () const opt_mode<T>::else_mode (machine_mode fallback) const
{ {
return m_mode == E_VOIDmode ? E_BLKmode : m_mode; return m_mode == E_VOIDmode ? fallback : m_mode;
} }
/* Assert that the object contains a T and return it. */ /* Assert that the object contains a T and return it. */
......
2019-08-13 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/init_2.c: Expect ld1rd to be used
instead of a full vector load.
* gcc.target/aarch64/sve/init_4.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2.c: Remove constants that no longer
need to be loaded from memory.
* gcc.target/aarch64/sve/slp_2.c: Expect the same output for
big and little endian.
* gcc.target/aarch64/sve/slp_3.c: Likewise. Expect 3 of the
doubles to be moved via integer registers rather than loaded
from memory.
* gcc.target/aarch64/sve/slp_4.c: Likewise but for 4 doubles.
* gcc.target/aarch64/sve/spill_4.c: Expect 16-bit constants to be
loaded via an integer register rather than from memory.
* gcc.target/aarch64/sve/const_1.c: New test.
* gcc.target/aarch64/sve/const_2.c: Likewise.
* gcc.target/aarch64/sve/const_3.c: Likewise.
2019-08-13 Jozef Lawrynowicz <jozef.l@mittosystems.com> 2019-08-13 Jozef Lawrynowicz <jozef.l@mittosystems.com>
* gcc.target/msp430/msp430.exp (msp430_device_permutations_runtest): * gcc.target/msp430/msp430.exp (msp430_device_permutations_runtest):
......
/* { dg-do compile } */
/* { dg-options "-O3" } */
#include <stdint.h>
void
set (uint64_t *dst, int count)
{
for (int i = 0; i < count; ++i)
dst[i] = 0xffff00ff00ffff00ULL;
}
/* { dg-final { scan-assembler {\tmovi\tv([0-9]+)\.2d, 0xffff00ff00ffff00\n.*\tdup\tz[0-9]+\.q, z\1\.q\[0\]\n} } } */
/* { dg-do compile } */
/* { dg-options "-O3" } */
#include <stdint.h>
#define TEST(TYPE, CONST) \
void \
set_##TYPE (TYPE *dst, int count) \
{ \
for (int i = 0; i < count; ++i) \
dst[i] = CONST; \
}
TEST (uint16_t, 129)
TEST (uint32_t, 129)
TEST (uint64_t, 129)
/* { dg-final { scan-assembler {\tmovi\tv([0-9]+)\.8h, 0x81\n[^:]*\tdup\tz[0-9]+\.q, z\1\.q\[0\]\n} } } */
/* { dg-final { scan-assembler {\tmovi\tv([0-9]+)\.4s, 0x81\n[^:]*\tdup\tz[0-9]+\.q, z\1\.q\[0\]\n} } } */
/* { dg-final { scan-assembler {\tmov\t(x[0-9]+), 129\n[^:]*\tmov\tz[0-9]+\.d, \1\n} } } */
/* { dg-do compile } */
/* { dg-options "-O3" } */
#include <stdint.h>
#define TEST(TYPE, CONST) \
void \
set_##TYPE (TYPE *dst, int count) \
{ \
for (int i = 0; i < count; ++i) \
dst[i] = CONST; \
}
TEST (uint16_t, 0x1234)
TEST (uint32_t, 0x1234)
TEST (uint64_t, 0x1234)
/* { dg-final { scan-assembler {\tmov\t(w[0-9]+), 4660\n[^:]*\tmov\tz[0-9]+\.h, \1\n} } } */
/* { dg-final { scan-assembler {\tmov\t(w[0-9]+), 4660\n[^:]*\tmov\tz[0-9]+\.s, \1\n} } } */
/* { dg-final { scan-assembler {\tmov\t(x[0-9]+), 4660\n[^:]*\tmov\tz[0-9]+\.d, \1\n} } } */
...@@ -11,9 +11,9 @@ typedef int32_t vnx4si __attribute__((vector_size (32))); ...@@ -11,9 +11,9 @@ typedef int32_t vnx4si __attribute__((vector_size (32)));
/* /*
** foo: ** foo:
** ... ** ...
** ld1w (z[0-9]+\.s), p[0-9]+/z, \[x[0-9]+\] ** ld1rd (z[0-9]+)\.d, p[0-9]+/z, \[x[0-9]+\]
** insr \1, w1 ** insr \1\.s, w1
** insr \1, w0 ** insr \1\.s, w0
** ... ** ...
*/ */
__attribute__((noipa)) __attribute__((noipa))
......
...@@ -11,10 +11,10 @@ typedef int32_t vnx4si __attribute__((vector_size (32))); ...@@ -11,10 +11,10 @@ typedef int32_t vnx4si __attribute__((vector_size (32)));
/* /*
** foo: ** foo:
** ... ** ...
** ld1w (z[0-9]+\.s), p[0-9]+/z, \[x[0-9]+\] ** ld1rd (z[0-9]+)\.d, p[0-9]+/z, \[x[0-9]+\]
** insr \1, w1 ** insr \1\.s, w1
** insr \1, w0 ** insr \1\.s, w0
** rev \1, \1 ** rev \1\.s, \1\.s
** ... ** ...
*/ */
__attribute__((noipa)) __attribute__((noipa))
......
...@@ -28,22 +28,6 @@ ...@@ -28,22 +28,6 @@
T (int64_t) T (int64_t)
#define FOR_EACH_LOAD_BROADCAST_IMM(T) \ #define FOR_EACH_LOAD_BROADCAST_IMM(T) \
T (int16_t, 129, imm_129) \
T (int32_t, 129, imm_129) \
T (int64_t, 129, imm_129) \
\
T (int16_t, -130, imm_m130) \
T (int32_t, -130, imm_m130) \
T (int64_t, -130, imm_m130) \
\
T (int16_t, 0x1234, imm_0x1234) \
T (int32_t, 0x1234, imm_0x1234) \
T (int64_t, 0x1234, imm_0x1234) \
\
T (int16_t, 0xFEDC, imm_0xFEDC) \
T (int32_t, 0xFEDC, imm_0xFEDC) \
T (int64_t, 0xFEDC, imm_0xFEDC) \
\
T (int32_t, 0x12345678, imm_0x12345678) \ T (int32_t, 0x12345678, imm_0x12345678) \
T (int64_t, 0x12345678, imm_0x12345678) \ T (int64_t, 0x12345678, imm_0x12345678) \
\ \
...@@ -56,6 +40,6 @@ FOR_EACH_LOAD_BROADCAST (DEF_LOAD_BROADCAST) ...@@ -56,6 +40,6 @@ FOR_EACH_LOAD_BROADCAST (DEF_LOAD_BROADCAST)
FOR_EACH_LOAD_BROADCAST_IMM (DEF_LOAD_BROADCAST_IMM) FOR_EACH_LOAD_BROADCAST_IMM (DEF_LOAD_BROADCAST_IMM)
/* { dg-final { scan-assembler-times {\tld1rb\tz[0-9]+\.b, p[0-7]/z, } 1 } } */ /* { dg-final { scan-assembler-times {\tld1rb\tz[0-9]+\.b, p[0-7]/z, } 1 } } */
/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, p[0-7]/z, } 5 } } */ /* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, p[0-7]/z, } 1 } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, p[0-7]/z, } 7 } } */ /* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, p[0-7]/z, } 3 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, p[0-7]/z, } 8 } } */ /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, p[0-7]/z, } 4 } } */
...@@ -29,12 +29,9 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \ ...@@ -29,12 +29,9 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \
TEST_ALL (VEC_PERM) TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 { target aarch64_little_endian } } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, w[0-9]+\n} 2 } } */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ /* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 { target aarch64_little_endian } } } */ /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */ /* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */
/* { dg-final { scan-assembler-not {\tzip1\t} } } */ /* { dg-final { scan-assembler-not {\tzip1\t} } } */
/* { dg-final { scan-assembler-not {\tzip2\t} } } */ /* { dg-final { scan-assembler-not {\tzip2\t} } } */
......
...@@ -32,18 +32,17 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \ ...@@ -32,18 +32,17 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \
TEST_ALL (VEC_PERM) TEST_ALL (VEC_PERM)
/* 1 for each 8-bit type. */ /* 1 for each 8-bit type. */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 { target aarch64_little_endian } } } */ /* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ /* 1 for each 16-bit type plus 1 for double. */
/* 1 for each 16-bit type and 4 for double. */ /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 { target aarch64_little_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 { target aarch64_big_endian } } } */
/* 1 for each 32-bit type. */ /* 1 for each 32-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */ /* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #25\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #25\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #31\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #31\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #62\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #62\n} 2 } } */
/* 3 for double. */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, x[0-9]+\n} 3 } } */
/* The 64-bit types need: /* The 64-bit types need:
ZIP1 ZIP1 (2 ZIP2s optimized away) ZIP1 ZIP1 (2 ZIP2s optimized away)
......
...@@ -35,10 +35,8 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \ ...@@ -35,10 +35,8 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \
TEST_ALL (VEC_PERM) TEST_ALL (VEC_PERM)
/* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double. */ /* 1 for each 8-bit type, 4 for each 32-bit type and 4 for double. */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 { target aarch64_little_endian } } } */ /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 18 } } */
/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 20 { target aarch64_big_endian } } } */
/* 1 for each 16-bit type. */ /* 1 for each 16-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */ /* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */
...@@ -49,6 +47,8 @@ TEST_ALL (VEC_PERM) ...@@ -49,6 +47,8 @@ TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #37\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #37\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #24\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #24\n} 2 } } */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #81\n} 2 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #81\n} 2 } } */
/* 4 for double. */
/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, x[0-9]+\n} 4 } } */
/* The 32-bit types need: /* The 32-bit types need:
ZIP1 ZIP1 (2 ZIP2s optimized away) ZIP1 ZIP1 (2 ZIP2s optimized away)
......
...@@ -24,10 +24,10 @@ TEST_LOOP (uint16_t, 0x1234); ...@@ -24,10 +24,10 @@ TEST_LOOP (uint16_t, 0x1234);
TEST_LOOP (uint32_t, 0x12345); TEST_LOOP (uint32_t, 0x12345);
TEST_LOOP (uint64_t, 0x123456); TEST_LOOP (uint64_t, 0x123456);
/* { dg-final { scan-assembler-times {\tptrue\tp[0-9]+\.h,} 3 } } */ /* { dg-final { scan-assembler-not {\tptrue\tp[0-9]+\.h,} } } */
/* { dg-final { scan-assembler-times {\tptrue\tp[0-9]+\.s,} 3 } } */ /* { dg-final { scan-assembler-times {\tptrue\tp[0-9]+\.s,} 3 } } */
/* { dg-final { scan-assembler-times {\tptrue\tp[0-9]+\.d,} 3 } } */ /* { dg-final { scan-assembler-times {\tptrue\tp[0-9]+\.d,} 3 } } */
/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h,} 3 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, w[0-9]+\n} 3 } } */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s,} 3 } } */ /* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s,} 3 } } */
/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d,} 3 } } */ /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d,} 3 } } */
/* { dg-final { scan-assembler-not {\tldr\tz[0-9]} } } */ /* { dg-final { scan-assembler-not {\tldr\tz[0-9]} } } */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment