[arm][3/3] Implement fp16fml lane intrinsics
This patch implements the lane-wise fp16fml intrinsics. There's quite a few of them so I've split them up from the other simpler fp16fml intrinsics. These ones expose instructions such as vfmal.f16 Dd, Sn, Sm[<index>] 0 <= index <= 1 vfmal.f16 Qd, Dn, Dm[<index>] 0 <= index <= 3 vfmsl.f16 Dd, Sn, Sm[<index>] 0 <= index <= 1 vfmsl.f16 Qd, Dn, Dm[<index>] 0 <= index <= 3 These instructions extract a single half-precision floating-point value from one of the source regs and perform a vfmal/vfmsl operation as per the normal variant with that value. The nuance here is that some of the intrinsics want to do things like: float32x2_t vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b, const int __index) where the float16x8_t value of '__b' is held in a Q register, so we need to be a bit smart about finding the right D or S sub-register and translating the lane number to a lane in that sub-register, instead of just passing the language-level const-int down to the assembly instruction. That's where most of the complexity of this patch comes from but hopefully it's orthogonal enough to make sense. Bootstrapped and tested on arm-none-linux-gnueabihf as well as armeb-none-eabi. * config/arm/arm_neon.h (vfmlal_lane_low_u32, vfmlal_lane_high_u32, vfmlalq_laneq_low_u32, vfmlalq_lane_low_u32, vfmlal_laneq_low_u32, vfmlalq_laneq_high_u32, vfmlalq_lane_high_u32, vfmlal_laneq_high_u32, vfmlsl_lane_low_u32, vfmlsl_lane_high_u32, vfmlslq_laneq_low_u32, vfmlslq_lane_low_u32, vfmlsl_laneq_low_u32, vfmlslq_laneq_high_u32, vfmlslq_lane_high_u32, vfmlsl_laneq_high_u32): Define. * config/arm/arm_neon_builtins.def (vfmal_lane_low, vfmal_lane_lowv4hf, vfmal_lane_lowv8hf, vfmal_lane_high, vfmal_lane_highv4hf, vfmal_lane_highv8hf, vfmsl_lane_low, vfmsl_lane_lowv4hf, vfmsl_lane_lowv8hf, vfmsl_lane_high, vfmsl_lane_highv4hf, vfmsl_lane_highv8hf): New sets of builtins. * config/arm/iterators.md (VFMLSEL2, vfmlsel2): New mode attributes. (V_lane_reg): Likewise. * config/arm/neon.md (neon_vfm<vfml_op>l_lane_<vfml_half><VCVTF:mode>): New define_expand. (neon_vfm<vfml_op>l_lane_<vfml_half><vfmlsel2><mode>): Likewise. (vfmal_lane_low<mode>_intrinsic, vfmal_lane_low<vfmlsel2><mode>_intrinsic, vfmal_lane_high<vfmlsel2><mode>_intrinsic, vfmal_lane_high<mode>_intrinsic, vfmsl_lane_low<mode>_intrinsic, vfmsl_lane_low<vfmlsel2><mode>_intrinsic, vfmsl_lane_high<vfmlsel2><mode>_intrinsic, vfmsl_lane_high<mode>_intrinsic): New define_insns. * gcc.target/arm/simd/fp16fml_lane_high.c: New test. * gcc.target/arm/simd/fp16fml_lane_low.c: New test. From-SVN: r256540
Showing
This diff is collapsed.
Click to expand it.
Please
register
or
sign in
to comment