This patch adds support to vectorize sum of abslolute differences (SAD_EXPR)
using SVE. Given this input code: int sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n) { int sum = 0; for (int i = 0; i < n; i++) { sum += __builtin_abs (x[i] - y[i]); } return sum; } The resulting SVE code is: 0000000000000000 <sum_abs>: 0: 7100005f cmp w2, #0x0 4: 5400026d b.le 50 <sum_abs+0x50> 8: d2800003 mov x3, #0x0 // #0 c: 93407c42 sxtw x2, w2 10: 2538c002 mov z2.b, #0 14: 25221fe0 whilelo p0.b, xzr, x2 18: 2538c023 mov z3.b, #1 1c: 2518e3e1 ptrue p1.b 20: a4034000 ld1b {z0.b}, p0/z, [x0, x3] 24: a4034021 ld1b {z1.b}, p0/z, [x1, x3] 28: 0430e3e3 incb x3 2c: 0520c021 sel z1.b, p0, z1.b, z0.b 30: 25221c60 whilelo p0.b, x3, x2 34: 040d0420 uabd z0.b, p1/m, z0.b, z1.b 38: 44830402 udot z2.s, z0.b, z3.b 3c: 54ffff21 b.ne 20 <sum_abs+0x20> // b.any 40: 2598e3e0 ptrue p0.s 44: 04812042 uaddv d2, p0, z2.s 48: 1e260040 fmov w0, s2 4c: d65f03c0 ret 50: 1e2703e2 fmov s2, wzr 54: 1e260040 fmov w0, s2 58: d65f03c0 ret Notice how udot is used inside a fully masked loop. gcc/Changelog: 2019-05-07 Alejandro Martinez <alejandro.martinezvicente@arm.com> * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New define_expand. (aarch64_<su>abd<mode>_3): Likewise. (*aarch64_<su>abd<mode>_3): New define_insn. (<sur>sad<vsi2qi>): New define_expand. * config/aarch64/iterators.md: Added MAX_OPP attribute. * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR. (build_vect_cond_expr): Likewise. gcc/testsuite/Changelog: 2019-05-07 Alejandro Martinez <alejandro.martinezvicente@arm.com> * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute differences. From-SVN: r270975
Showing
gcc/testsuite/gcc.target/aarch64/sve/sad_1.c
0 → 100644
Please
register
or
sign in
to comment