This patch adds the +fp16fml extension that enables some half-precision floating-point Advanced SIMD instructions, available through arm_neon.h intrinsics. This extension is on by default for armv8.4-a if fp16 is available, so it can be enabled by -march=armv8.4-a+fp16. fp16fml is also available for armv8.2-a and armv8.3-a through the +fp16fml option that is added for these architectures. The new instructions that this patch adds support for are: vfmal.f16 Dr, Sm, Sn vfmal.f16 Qr, Dm, Dn vfmsl.f16 Dr, Sm, Sn vfmsl.f16 Qr, Dm, Dn They interpret their input registers as a vector of half-precision floating-point values, extend them to single-precision vectors and perform a fused multiply-add or subtract of them with the destination vector. This patch exposes these instructions through arm_neon.h intrinsics. The set of intrinsics allows us to do stuff such as perform the multiply-add/subtract operation on the low or top half of float16x4_t and float16x8_t values. This maps naturally in aarch64 to the FMLAL and FMLAL2 instructions but on arm we have to use the fact that consecutive NEON registers overlap the wider register (i.e. d0 is s0 plus s1, q0 is d0 plus d1 etc). This just means we have to be careful to use the right subreg operand print code. New arm-specific builtins are defined to expand to the new patterns. I've managed to compress the define_expands using code, mode and int iterators but the define_insns don't compress very well without two-tiered iterators (iterator attributes expanding to iterators) which we don't support. Bootstrapped and tested on arm-none-linux-gnueabihf and also on armeb-none-eabi. * config/arm/arm-cpus.in (fp16fml): New feature. (ALL_SIMD): Add fp16fml. (armv8.2-a): Add fp16fml as an option. (armv8.3-a): Likewise. (armv8.4-a): Add fp16fml as part of fp16. * config/arm/arm.h (TARGET_FP16FML): Define. * config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_FP16_FML when appropriate. * config/arm/arm-modes.def (V2HF): Define. * config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32, vfmlal_high_u32, vfmlsl_high_u32, vfmlalq_low_u32, vfmlslq_low_u32, vfmlalq_high_u32, vfmlslq_high_u32): Define. * config/arm/arm_neon_builtins.def (vfmal_low, vfmal_high, vfmsl_low, vfmsl_high): New set of builtins. * config/arm/iterators.md (PLUSMINUS): New code iterator. (vfml_op): New code attribute. (VFMLHALVES): New int iterator. (VFML, VFMLSEL): New mode attributes. (V_reg): Define mapping for V2HF. (V_hi, V_lo): New mode attributes. (VF_constraint): Likewise. (vfml_half, vfml_half_selector): New int attributes. * config/arm/neon.md (neon_vfm<vfml_op>l_<vfml_half><mode>): New define_expand. (vfmal_low<mode>_intrinsic, vfmsl_high<mode>_intrinsic, vfmal_high<mode>_intrinsic, vfmsl_low<mode>_intrinsic): New define_insn. * config/arm/t-arm-elf (v8_fps): Add fp16fml. * config/arm/t-multilib (v8_2_a_simd_variants): Add fp16fml. * config/arm/unspecs.md (UNSPEC_VFML_LO, UNSPEC_VFML_HI): New unspecs. * doc/invoke.texi (ARM Options): Document fp16fml. Update armv8.4-a documentation. * doc/sourcebuild.texi (arm_fp16fml_neon_ok, arm_fp16fml_neon): Document new effective target and option set. * gcc.target/arm/multilib.exp: Add combination tests for fp16fml. * gcc.target/arm/simd/fp16fml_high.c: New test. * gcc.target/arm/simd/fp16fml_low.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_fp16fml_neon_ok_nocache, check_effective_target_arm_fp16fml_neon_ok, add_options_for_arm_fp16fml_neon): New procedures. From-SVN: r256539
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
include | Loading commit data... | |
avr-mmcu.texi | Loading commit data... | |
bugreport.texi | Loading commit data... | |
cfg.texi | Loading commit data... | |
collect2.texi | Loading commit data... | |
compat.texi | Loading commit data... | |
configfiles.texi | Loading commit data... | |
configterms.texi | Loading commit data... | |
contrib.texi | Loading commit data... | |
contribute.texi | Loading commit data... | |
cpp.texi | Loading commit data... | |
cppdiropts.texi | Loading commit data... | |
cppenv.texi | Loading commit data... | |
cppinternals.texi | Loading commit data... | |
cppopts.texi | Loading commit data... | |
cppwarnopts.texi | Loading commit data... | |
extend.texi | Loading commit data... | |
fragments.texi | Loading commit data... | |
frontends.texi | Loading commit data... | |
gcc.texi | Loading commit data... | |
gccint.texi | Loading commit data... | |
gcov-dump.texi | Loading commit data... | |
gcov-tool.texi | Loading commit data... | |
gcov.texi | Loading commit data... | |
generic.texi | Loading commit data... | |
gimple.texi | Loading commit data... | |
gnu.texi | Loading commit data... | |
gty.texi | Loading commit data... | |
headerdirs.texi | Loading commit data... | |
hostconfig.texi | Loading commit data... | |
implement-c.texi | Loading commit data... | |
implement-cxx.texi | Loading commit data... | |
install-old.texi | Loading commit data... | |
install.texi | Loading commit data... | |
install.texi2html | Loading commit data... | |
interface.texi | Loading commit data... | |
invoke.texi | Loading commit data... | |
languages.texi | Loading commit data... | |
libgcc.texi | Loading commit data... | |
loop.texi | Loading commit data... | |
lto.texi | Loading commit data... | |
makefile.texi | Loading commit data... | |
match-and-simplify.texi | Loading commit data... | |
md.texi | Loading commit data... | |
objc.texi | Loading commit data... | |
optinfo.texi | Loading commit data... | |
options.texi | Loading commit data... | |
passes.texi | Loading commit data... | |
plugins.texi | Loading commit data... | |
poly-int.texi | Loading commit data... | |
portability.texi | Loading commit data... | |
rtl.texi | Loading commit data... | |
service.texi | Loading commit data... | |
sourcebuild.texi | Loading commit data... | |
standards.texi | Loading commit data... | |
tm.texi | Loading commit data... | |
tm.texi.in | Loading commit data... | |
tree-ssa.texi | Loading commit data... | |
trouble.texi | Loading commit data... |