Improve dup pattern
Improve the dup pattern to prefer vector registers. When doing a dup after a load, the register allocator thinks the costs are identical and chooses an integer load. However a dup from an integer register includes an int->fp transfer which is not modelled. Adding a '?' to the integer variant means the cost is increased slightly so we prefer using a vector register. This improves the following example: #include <arm_neon.h> void f(unsigned *a, uint32x4_t *b) { b[0] = vdupq_n_u32(a[1]); b[1] = vdupq_n_u32(a[2]); } to: ldr s0, [x0, 4] dup v0.4s, v0.s[0] str q0, [x1] ldr s0, [x0, 8] dup v0.4s, v0.s[0] str q0, [x1, 16] ret gcc/ * config/aarch64/aarch64-simd.md (aarch64_simd_dup): Swap alternatives, make integer dup more expensive. From-SVN: r249443
Showing
Please
register
or
sign in
to comment