gcc/doc · 346ac3a8a2de48d971c355cf82fe82421d2545b7 · lvzhengyang / riscv-gcc-1

[arm][2/3] Implement fp16fml extension for ARMv8.4-A · 06e95715

This patch adds the +fp16fml extension that enables some
half-precision floating-point Advanced SIMD instructions,
available through arm_neon.h intrinsics.

This extension is on by default for armv8.4-a
if fp16 is available, so it can be enabled by -march=armv8.4-a+fp16.

fp16fml is also available for armv8.2-a and armv8.3-a through the
+fp16fml option that is added for these architectures.

The new instructions that this patch adds support for are:
vfmal.f16 Dr, Sm, Sn
vfmal.f16 Qr, Dm, Dn
vfmsl.f16 Dr, Sm, Sn
vfmsl.f16 Qr, Dm, Dn

They interpret their input registers as a vector of half-precision
floating-point values, extend them to single-precision vectors
and perform a fused multiply-add or subtract of them with the
destination vector.

This patch exposes these instructions through arm_neon.h intrinsics.
The set of intrinsics allows us to do stuff such as perform
the multiply-add/subtract operation on the low or top half of
float16x4_t and float16x8_t values.  This maps naturally in aarch64
to the FMLAL and FMLAL2 instructions but on arm we have to use the
fact that consecutive NEON registers overlap the wider register
(i.e. d0 is s0 plus s1, q0 is d0 plus d1 etc). This just means
we have to be careful to use the right subreg operand print code.

New arm-specific builtins are defined to expand to the new patterns.
I've managed to compress the define_expands using code, mode and int
iterators but the define_insns don't compress very well without two-tiered
iterators (iterator attributes expanding to iterators) which we
don't support.

Bootstrapped and tested on arm-none-linux-gnueabihf and also on
armeb-none-eabi.

	* config/arm/arm-cpus.in (fp16fml): New feature.
	(ALL_SIMD): Add fp16fml.
	(armv8.2-a): Add fp16fml as an option.
	(armv8.3-a): Likewise.
	(armv8.4-a): Add fp16fml as part of fp16.
	* config/arm/arm.h (TARGET_FP16FML): Define.
	* config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_FP16_FML
	when appropriate.
	* config/arm/arm-modes.def (V2HF): Define.
	* config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
	vfmlal_high_u32, vfmlsl_high_u32, vfmlalq_low_u32,
	vfmlslq_low_u32, vfmlalq_high_u32, vfmlslq_high_u32): Define.
	* config/arm/arm_neon_builtins.def (vfmal_low, vfmal_high,
	vfmsl_low, vfmsl_high): New set of builtins.
	* config/arm/iterators.md (PLUSMINUS): New code iterator.
	(vfml_op): New code attribute.
	(VFMLHALVES): New int iterator.
	(VFML, VFMLSEL): New mode attributes.
	(V_reg): Define mapping for V2HF.
	(V_hi, V_lo): New mode attributes.
	(VF_constraint): Likewise.
	(vfml_half, vfml_half_selector): New int attributes.
	* config/arm/neon.md (neon_vfm<vfml_op>l_<vfml_half><mode>): New
	define_expand.
	(vfmal_low<mode>_intrinsic, vfmsl_high<mode>_intrinsic,
	vfmal_high<mode>_intrinsic, vfmsl_low<mode>_intrinsic):
	New define_insn.
	* config/arm/t-arm-elf (v8_fps): Add fp16fml.
	* config/arm/t-multilib (v8_2_a_simd_variants): Add fp16fml.
	* config/arm/unspecs.md (UNSPEC_VFML_LO, UNSPEC_VFML_HI): New unspecs.
	* doc/invoke.texi (ARM Options): Document fp16fml.  Update armv8.4-a
	documentation.
	* doc/sourcebuild.texi (arm_fp16fml_neon_ok, arm_fp16fml_neon):
	Document new effective target and option set.

	* gcc.target/arm/multilib.exp: Add combination tests for fp16fml.
	* gcc.target/arm/simd/fp16fml_high.c: New test.
	* gcc.target/arm/simd/fp16fml_low.c: Likewise.
	* lib/target-supports.exp
	(check_effective_target_arm_fp16fml_neon_ok_nocache,
	check_effective_target_arm_fp16fml_neon_ok,
	add_options_for_arm_fp16fml_neon): New procedures.

From-SVN: r256539

committed Jan 11, 2018

06e95715

Name	Last commit	Last update
..
include		Loading commit data...
avr-mmcu.texi		Loading commit data...
bugreport.texi		Loading commit data...
cfg.texi		Loading commit data...
collect2.texi		Loading commit data...
compat.texi		Loading commit data...
configfiles.texi		Loading commit data...
configterms.texi		Loading commit data...
contrib.texi		Loading commit data...
contribute.texi		Loading commit data...
cpp.texi		Loading commit data...
cppdiropts.texi		Loading commit data...
cppenv.texi		Loading commit data...
cppinternals.texi		Loading commit data...
cppopts.texi		Loading commit data...
cppwarnopts.texi		Loading commit data...
extend.texi		Loading commit data...
fragments.texi		Loading commit data...
frontends.texi		Loading commit data...
gcc.texi		Loading commit data...
gccint.texi		Loading commit data...
gcov-dump.texi		Loading commit data...
gcov-tool.texi		Loading commit data...
gcov.texi		Loading commit data...
generic.texi		Loading commit data...
gimple.texi		Loading commit data...
gnu.texi		Loading commit data...
gty.texi		Loading commit data...
headerdirs.texi		Loading commit data...
hostconfig.texi		Loading commit data...
implement-c.texi		Loading commit data...
implement-cxx.texi		Loading commit data...
install-old.texi		Loading commit data...
install.texi		Loading commit data...
install.texi2html		Loading commit data...
interface.texi		Loading commit data...
invoke.texi		Loading commit data...
languages.texi		Loading commit data...
libgcc.texi		Loading commit data...
loop.texi		Loading commit data...
lto.texi		Loading commit data...
makefile.texi		Loading commit data...
match-and-simplify.texi		Loading commit data...
md.texi		Loading commit data...
objc.texi		Loading commit data...
optinfo.texi		Loading commit data...
options.texi		Loading commit data...
passes.texi		Loading commit data...
plugins.texi		Loading commit data...
poly-int.texi		Loading commit data...
portability.texi		Loading commit data...
rtl.texi		Loading commit data...
service.texi		Loading commit data...
sourcebuild.texi		Loading commit data...
standards.texi		Loading commit data...
tm.texi		Loading commit data...
tm.texi.in		Loading commit data...
tree-ssa.texi		Loading commit data...
trouble.texi		Loading commit data...