[AArch64] Improve popcount expansion
The popcount expansion uses umov to extend the result and move it back to the integer register file. If we model ADDV as a zero-extending operation, fmov can be used to move back to the integer side. This results in a ~0.5% speedup on deepsjeng on Cortex-A57. A typical __builtin_popcount expansion is now: fmov s0, w0 cnt v0.8b, v0.8b addv b0, v0.8b fmov w0, s0 gcc/ * config/aarch64/aarch64-simd.md (aarch64_zero_extend<GPI:mode>_reduc_plus_<VDQV_E:mode>): New pattern. * config/aarch64/aarch64.md (popcount<mode>2): Use it instead of generating separate ADDV and zero_extend patterns. * config/aarch64/iterators.md (VDQV_E): New iterator. testsuite/ * gcc.target/aarch64/popcnt2.c: New test.
Showing
gcc/testsuite/gcc.target/aarch64/popcnt2.c
0 → 100644
Please
register
or
sign in
to comment