Commit 55a9b91b by Matthew Wahab Committed by Matthew Wahab

[PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

gcc/
2016-09-23  Matthew Wahab  <matthew.wahab@arm.com>

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.  Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add<mode>3_fp16): New.
	(sub<mode>3_fp16): New.
	(mul<mode>3add<mode>_neon): New.
	(fma<VH:mode>4_intrinsic): New.
	(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
	(fmsub<VH:mode>4_intrinsic): New.
	(<absneg_str><mode>2): New.
	(neon_v<absneg_str><mode>): New.
	(neon_v<fp16_rnd_str><mode>): New.
	(neon_vrsqrte<mode>): New.
	(neon_vpaddv4hf): New.
	(neon_vadd<mode>): New.
	(neon_vsub<mode>): New.
	(neon_vmulf<mode>): New.
	(neon_vfma<VH:mode>): New.
	(neon_vfms<VH:mode>): New.
	(neon_vc<cmp_op><mode>): New.
	(neon_vc<cmp_op><mode>_fp16insn): New
	(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vca<cmp_op><mode>): New.
	(neon_vca<cmp_op><mode>_fp16insn): New.
	(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
	(neon_vc<cmp_op>z<mode>): New.
	(neon_vabd<mode>): New.
	(neon_v<maxmin>f<mode>): New.
	(neon_vp<maxmin>fv4hf: New.
	(neon_<fmaxmin_op><mode>): New.
	(neon_vrecps<mode>): New.
	(neon_vrsqrts<mode>): New.
	(neon_vrecpe<mode>): New (VH variant).
	(neon_vdup_lane<mode>_internal): New.
	(neon_vdup_lane<mode>): New.
	(neon_vcvt<sup><mode>): New (VCVTHI variant).
	(neon_vcvt<sup><mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VH variant).
	(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
	(neon_vcvt<vcvth_op><sup><mode>): New.
	(neon_vmul_lane<mode>): New.
	(neon_vmul_n<mode>): New.
	* config/arm/unspecs.md (UNSPEC_VCALE): New
	(UNSPEC_VCALT): New.
	(UNSPEC_VFMA_LANE): New.
	(UNSPECS_VFMS_LANE): New.

testsuite/
2016-09-23  Matthew Wahab  <matthew.wahab@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
	options.  Add tests for float16x4_t and float16x8_t.

From-SVN: r240415
parent 64c744b9
2016-09-23 Matthew Wahab <matthew.wahab@arm.com>
* config/arm/iterators.md (VCVTHI): New.
(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE. Fix a long line.
(NEON_VAGLTE): New.
(VFM_LANE_AS): New.
(VH_CVTTO): New.
(V_reg): Add HF, V4HF and V8HF. Fix white-space.
(V_HALF): Add V4HF. Fix white-space.
(V_if_elem): Add HF, V4HF and V8HF. Fix white-space.
(V_s_elem): Likewise.
(V_sz_elem): Fix white-space.
(V_elem_ch): Likewise.
(VH_elem_ch): New.
(scalar_mul_constraint): Add V8HF and V4HF.
(Is_float_mode): Fix white-space.
(Is_d_reg): Add V4HF and V8HF. Fix white-space.
(q): Add HF. Fix white-space.
(float_sup): New.
(float_SUP): New.
(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
(neon_vfm_lane_as): New.
* config/arm/neon.md (add<mode>3_fp16): New.
(sub<mode>3_fp16): New.
(mul<mode>3add<mode>_neon): New.
(fma<VH:mode>4_intrinsic): New.
(fmsub<VCVTF:mode>4_intrinsic): Fix white-space.
(fmsub<VH:mode>4_intrinsic): New.
(<absneg_str><mode>2): New.
(neon_v<absneg_str><mode>): New.
(neon_v<fp16_rnd_str><mode>): New.
(neon_vrsqrte<mode>): New.
(neon_vpaddv4hf): New.
(neon_vadd<mode>): New.
(neon_vsub<mode>): New.
(neon_vmulf<mode>): New.
(neon_vfma<VH:mode>): New.
(neon_vfms<VH:mode>): New.
(neon_vc<cmp_op><mode>): New.
(neon_vc<cmp_op><mode>_fp16insn): New
(neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New.
(neon_vca<cmp_op><mode>): New.
(neon_vca<cmp_op><mode>_fp16insn): New.
(neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New.
(neon_vc<cmp_op>z<mode>): New.
(neon_vabd<mode>): New.
(neon_v<maxmin>f<mode>): New.
(neon_vp<maxmin>fv4hf: New.
(neon_<fmaxmin_op><mode>): New.
(neon_vrecps<mode>): New.
(neon_vrsqrts<mode>): New.
(neon_vrecpe<mode>): New (VH variant).
(neon_vdup_lane<mode>_internal): New.
(neon_vdup_lane<mode>): New.
(neon_vcvt<sup><mode>): New (VCVTHI variant).
(neon_vcvt<sup><mode>): New (VH variant).
(neon_vcvt<sup>_n<mode>): New (VH variant).
(neon_vcvt<sup>_n<mode>): New (VCVTHI variant).
(neon_vcvt<vcvth_op><sup><mode>): New.
(neon_vmul_lane<mode>): New.
(neon_vmul_n<mode>): New.
* config/arm/unspecs.md (UNSPEC_VCALE): New
(UNSPEC_VCALT): New.
(UNSPEC_VFMA_LANE): New.
(UNSPECS_VFMS_LANE): New.
2016-09-23 Dominik Vogt <vogt@linux.vnet.ibm.com>
* config/s390/s390.md ("*extzv<mode>_zEC12", "*extzv<mode>_z10")
......
......@@ -145,6 +145,9 @@
;; Vector modes form int->float conversions.
(define_mode_iterator VCVTI [V2SI V4SI])
;; Vector modes for int->half conversions.
(define_mode_iterator VCVTHI [V4HI V8HI])
;; Vector modes for doubleword multiply-accumulate, etc. insns.
(define_mode_iterator VMD [V4HI V2SI V2SF])
......@@ -267,10 +270,14 @@
(define_int_iterator VRINT [UNSPEC_VRINTZ UNSPEC_VRINTP UNSPEC_VRINTM
UNSPEC_VRINTR UNSPEC_VRINTX UNSPEC_VRINTA])
(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE UNSPEC_VCLT UNSPEC_VCLE])
(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE
UNSPEC_VCLT UNSPEC_VCLE])
(define_int_iterator NEON_VACMP [UNSPEC_VCAGE UNSPEC_VCAGT])
(define_int_iterator NEON_VAGLTE [UNSPEC_VCAGE UNSPEC_VCAGT
UNSPEC_VCALE UNSPEC_VCALT])
(define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA])
(define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRINTM
......@@ -398,6 +405,8 @@
(define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
(define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
;;----------------------------------------------------------------------------
;; Mode attributes
;;----------------------------------------------------------------------------
......@@ -416,6 +425,10 @@
(define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si")
(V4SI "v4sf") (V4SF "v4si")])
;; (Opposite) mode to convert to/from for vector-half mode conversions.
(define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
(V8HI "V8HF") (V8HF "V8HI")])
;; Define element mode for each vector mode.
(define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
(V4HI "HI") (V8HI "HI")
......@@ -464,7 +477,8 @@
(V2SI "P") (V4SI "q")
(V2SF "P") (V4SF "q")
(DI "P") (V2DI "q")
(SF "") (DF "P")])
(SF "") (DF "P")
(HF "")])
;; Wider modes with the same number of elements.
(define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")])
......@@ -480,7 +494,7 @@
(define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
(V8HF "V4HF") (V4SI "V2SI")
(V4SF "V2SF") (V2DF "DF")
(V2DI "DI")])
(V2DI "DI") (V4HF "HF")])
;; Same, but lower-case.
(define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi")
......@@ -533,14 +547,18 @@
(V2SI "i32") (V4SI "i32")
(DI "i64") (V2DI "i64")
(V2SF "f32") (V4SF "f32")
(SF "f32") (DF "f64")])
(SF "f32") (DF "f64")
(HF "f16") (V4HF "f16")
(V8HF "f16")])
;; Same, but for operations which work on signed values.
(define_mode_attr V_s_elem [(V8QI "s8") (V16QI "s8")
(V4HI "s16") (V8HI "s16")
(V2SI "s32") (V4SI "s32")
(DI "s64") (V2DI "s64")
(V2SF "f32") (V4SF "f32")])
(V2SF "f32") (V4SF "f32")
(HF "f16") (V4HF "f16")
(V8HF "f16")])
;; Same, but for operations which work on unsigned values.
(define_mode_attr V_u_elem [(V8QI "u8") (V16QI "u8")
......@@ -567,8 +585,13 @@
(V4HI "h") (V8HI "h")
(V2SI "s") (V4SI "s")
(DI "d") (V2DI "d")
(V2SF "s") (V4SF "s")
(V2SF "s") (V4SF "s")])
(define_mode_attr VH_elem_ch [(V4HI "s") (V8HI "s")
(V4HF "s") (V8HF "s")
(HF "s")])
;; Element sizes for duplicating ARM registers to all elements of a vector.
(define_mode_attr VD_dup [(V8QI "8") (V4HI "16") (V2SI "32") (V2SF "32")])
......@@ -603,7 +626,8 @@
;; This mode attribute is used to obtain the correct register constraints.
(define_mode_attr scalar_mul_constraint [(V4HI "x") (V2SI "t") (V2SF "t")
(V8HI "x") (V4SI "t") (V4SF "t")])
(V8HI "x") (V4SI "t") (V4SF "t")
(V8HF "x") (V4HF "x")])
;; Predicates used for setting type for neon instructions
......@@ -674,8 +698,10 @@
(V2SI "") (V4SI "_q")
(V4HF "") (V8HF "_q")
(V2SF "") (V4SF "_q")
(V4HF "") (V8HF "_q")
(DI "") (V2DI "_q")
(DF "") (V2DF "_q")])
(DF "") (V2DF "_q")
(HF "")])
(define_mode_attr pf [(V8QI "p") (V16QI "p") (V2SF "f") (V4SF "f")])
......@@ -718,6 +744,10 @@
;; Conversions.
(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
(define_code_attr float_sup [(unsigned_float "u") (float "s")])
(define_code_attr float_SUP [(unsigned_float "U") (float "S")])
;;----------------------------------------------------------------------------
;; Int attributes
;;----------------------------------------------------------------------------
......@@ -792,7 +822,8 @@
(define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
(UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
(UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
(UNSPEC_VCAGT "gt")])
(UNSPEC_VCAGT "gt") (UNSPEC_VCALE "le")
(UNSPEC_VCALT "lt")])
(define_int_attr r [
(UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r")
......@@ -908,3 +939,7 @@
;; Attributes for VQRDMLAH/VQRDMLSH
(define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
;; Attributes for VFMA_LANE/ VFMS_LANE
(define_int_attr neon_vfm_lane_as
[(UNSPEC_VFMA_LANE "a") (UNSPEC_VFMS_LANE "s")])
......@@ -505,6 +505,20 @@
(const_string "neon_add<q>")))]
)
(define_insn "add<mode>3_fp16"
[(set
(match_operand:VH 0 "s_register_operand" "=w")
(plus:VH
(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")))]
"TARGET_NEON_FP16INST"
"vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set (attr "type")
(if_then_else (match_test "<Is_float_mode>")
(const_string "neon_fp_addsub_s<q>")
(const_string "neon_add<q>")))]
)
(define_insn "adddi3_neon"
[(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
(plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
......@@ -543,6 +557,17 @@
(const_string "neon_sub<q>")))]
)
(define_insn "sub<mode>3_fp16"
[(set
(match_operand:VH 0 "s_register_operand" "=w")
(minus:VH
(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")))]
"TARGET_NEON_FP16INST"
"vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_sub<q>")]
)
(define_insn "subdi3_neon"
[(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r,?w")
(minus:DI (match_operand:DI 1 "s_register_operand" "w,0,r,0,w")
......@@ -591,6 +616,16 @@
(const_string "neon_mla_<V_elem_ch><q>")))]
)
(define_insn "mul<mode>3add<mode>_neon"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(plus:VH (mult:VH (match_operand:VH 2 "s_register_operand" "w")
(match_operand:VH 3 "s_register_operand" "w"))
(match_operand:VH 1 "s_register_operand" "0")))]
"TARGET_NEON_FP16INST && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
"vmla.f16\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
[(set_attr "type" "neon_fp_mla_s<q>")]
)
(define_insn "mul<mode>3neg<mode>add<mode>_neon"
[(set (match_operand:VDQW 0 "s_register_operand" "=w")
(minus:VDQW (match_operand:VDQW 1 "s_register_operand" "0")
......@@ -629,6 +664,19 @@
[(set_attr "type" "neon_fp_mla_s<q>")]
)
;; There is limited support for unsafe-math optimizations using the NEON FP16
;; arithmetic instructions, so only the intrinsic is currently supported.
(define_insn "fma<VH:mode>4_intrinsic"
[(set (match_operand:VH 0 "register_operand" "=w")
(fma:VH
(match_operand:VH 1 "register_operand" "w")
(match_operand:VH 2 "register_operand" "w")
(match_operand:VH 3 "register_operand" "0")))]
"TARGET_NEON_FP16INST"
"vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_mla_s<q>")]
)
(define_insn "*fmsub<VCVTF:mode>4"
[(set (match_operand:VCVTF 0 "register_operand" "=w")
(fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
......@@ -641,7 +689,8 @@
(define_insn "fmsub<VCVTF:mode>4_intrinsic"
[(set (match_operand:VCVTF 0 "register_operand" "=w")
(fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
(fma:VCVTF
(neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
(match_operand:VCVTF 2 "register_operand" "w")
(match_operand:VCVTF 3 "register_operand" "0")))]
"TARGET_NEON && TARGET_FMA"
......@@ -649,6 +698,17 @@
[(set_attr "type" "neon_fp_mla_s<q>")]
)
(define_insn "fmsub<VH:mode>4_intrinsic"
[(set (match_operand:VH 0 "register_operand" "=w")
(fma:VH
(neg:VH (match_operand:VH 1 "register_operand" "w"))
(match_operand:VH 2 "register_operand" "w")
(match_operand:VH 3 "register_operand" "0")))]
"TARGET_NEON_FP16INST"
"vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_mla_s<q>")]
)
(define_insn "neon_vrint<NEON_VRINT:nvrint_variant><VCVTF:mode>"
[(set (match_operand:VCVTF 0 "s_register_operand" "=w")
(unspec:VCVTF [(match_operand:VCVTF 1
......@@ -860,6 +920,44 @@
""
)
(define_insn "<absneg_str><mode>2"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(ABSNEG:VH (match_operand:VH 1 "s_register_operand" "w")))]
"TARGET_NEON_FP16INST"
"v<absneg_str>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
[(set_attr "type" "neon_abs<q>")]
)
(define_expand "neon_v<absneg_str><mode>"
[(set
(match_operand:VH 0 "s_register_operand")
(ABSNEG:VH (match_operand:VH 1 "s_register_operand")))]
"TARGET_NEON_FP16INST"
{
emit_insn (gen_<absneg_str><mode>2 (operands[0], operands[1]));
DONE;
})
(define_insn "neon_v<fp16_rnd_str><mode>"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH
[(match_operand:VH 1 "s_register_operand" "w")]
FP16_RND))]
"TARGET_NEON_FP16INST"
"<fp16_rnd_insn>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
[(set_attr "type" "neon_fp_round_s<q>")]
)
(define_insn "neon_vrsqrte<mode>"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH
[(match_operand:VH 1 "s_register_operand" "w")]
UNSPEC_VRSQRTE))]
"TARGET_NEON_FP16INST"
"vrsqrte.f16\t%<V_reg>0, %<V_reg>1"
[(set_attr "type" "neon_fp_rsqrte_s<q>")]
)
(define_insn "*umin<mode>3_neon"
[(set (match_operand:VDQIW 0 "s_register_operand" "=w")
(umin:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
......@@ -1601,6 +1699,17 @@
(const_string "neon_reduc_add<q>")))]
)
(define_insn "neon_vpaddv4hf"
[(set
(match_operand:V4HF 0 "s_register_operand" "=w")
(unspec:V4HF [(match_operand:V4HF 1 "s_register_operand" "w")
(match_operand:V4HF 2 "s_register_operand" "w")]
UNSPEC_VPADD))]
"TARGET_NEON_FP16INST"
"vpadd.f16\t%P0, %P1, %P2"
[(set_attr "type" "neon_reduc_add")]
)
(define_insn "neon_vpsmin<mode>"
[(set (match_operand:VD 0 "s_register_operand" "=w")
(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
......@@ -1949,6 +2058,26 @@
DONE;
})
(define_expand "neon_vadd<mode>"
[(match_operand:VH 0 "s_register_operand")
(match_operand:VH 1 "s_register_operand")
(match_operand:VH 2 "s_register_operand")]
"TARGET_NEON_FP16INST"
{
emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
DONE;
})
(define_expand "neon_vsub<mode>"
[(match_operand:VH 0 "s_register_operand")
(match_operand:VH 1 "s_register_operand")
(match_operand:VH 2 "s_register_operand")]
"TARGET_NEON_FP16INST"
{
emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2]));
DONE;
})
; Note that NEON operations don't support the full IEEE 754 standard: in
; particular, denormal values are flushed to zero. This means that GCC cannot
; use those instructions for autovectorization, etc. unless
......@@ -2040,6 +2169,17 @@
(const_string "neon_mul_<V_elem_ch><q>")))]
)
(define_insn "neon_vmulf<mode>"
[(set
(match_operand:VH 0 "s_register_operand" "=w")
(mult:VH
(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")))]
"TARGET_NEON_FP16INST"
"vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_mul_<VH_elem_ch><q>")]
)
(define_expand "neon_vmla<mode>"
[(match_operand:VDQW 0 "s_register_operand" "=w")
(match_operand:VDQW 1 "s_register_operand" "0")
......@@ -2068,6 +2208,18 @@
DONE;
})
(define_expand "neon_vfma<VH:mode>"
[(match_operand:VH 0 "s_register_operand")
(match_operand:VH 1 "s_register_operand")
(match_operand:VH 2 "s_register_operand")
(match_operand:VH 3 "s_register_operand")]
"TARGET_NEON_FP16INST"
{
emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3],
operands[1]));
DONE;
})
(define_expand "neon_vfms<VCVTF:mode>"
[(match_operand:VCVTF 0 "s_register_operand")
(match_operand:VCVTF 1 "s_register_operand")
......@@ -2080,6 +2232,18 @@
DONE;
})
(define_expand "neon_vfms<VH:mode>"
[(match_operand:VH 0 "s_register_operand")
(match_operand:VH 1 "s_register_operand")
(match_operand:VH 2 "s_register_operand")
(match_operand:VH 3 "s_register_operand")]
"TARGET_NEON_FP16INST"
{
emit_insn (gen_fmsub<mode>4_intrinsic (operands[0], operands[2], operands[3],
operands[1]));
DONE;
})
; Used for intrinsics when flag_unsafe_math_optimizations is false.
(define_insn "neon_vmla<mode>_unspec"
......@@ -2380,6 +2544,72 @@
[(set_attr "type" "neon_fp_compare_s<q>")]
)
(define_expand "neon_vc<cmp_op><mode>"
[(match_operand:<V_cmp_result> 0 "s_register_operand")
(neg:<V_cmp_result>
(COMPARISONS:VH
(match_operand:VH 1 "s_register_operand")
(match_operand:VH 2 "reg_or_zero_operand")))]
"TARGET_NEON_FP16INST"
{
/* For FP comparisons use UNSPECS unless -funsafe-math-optimizations
are enabled. */
if (GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
&& !flag_unsafe_math_optimizations)
emit_insn
(gen_neon_vc<cmp_op><mode>_fp16insn_unspec
(operands[0], operands[1], operands[2]));
else
emit_insn
(gen_neon_vc<cmp_op><mode>_fp16insn
(operands[0], operands[1], operands[2]));
DONE;
})
(define_insn "neon_vc<cmp_op><mode>_fp16insn"
[(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
(neg:<V_cmp_result>
(COMPARISONS:<V_cmp_result>
(match_operand:VH 1 "s_register_operand" "w,w")
(match_operand:VH 2 "reg_or_zero_operand" "w,Dz"))))]
"TARGET_NEON_FP16INST
&& !(GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
&& !flag_unsafe_math_optimizations)"
{
char pattern[100];
sprintf (pattern, "vc<cmp_op>.%s%%#<V_sz_elem>\t%%<V_reg>0,"
" %%<V_reg>1, %s",
GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
? "f" : "<cmp_type>",
which_alternative == 0
? "%<V_reg>2" : "#0");
output_asm_insn (pattern, operands);
return "";
}
[(set (attr "type")
(if_then_else (match_operand 2 "zero_operand")
(const_string "neon_compare_zero<q>")
(const_string "neon_compare<q>")))])
(define_insn "neon_vc<cmp_op_unsp><mode>_fp16insn_unspec"
[(set
(match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
(unspec:<V_cmp_result>
[(match_operand:VH 1 "s_register_operand" "w,w")
(match_operand:VH 2 "reg_or_zero_operand" "w,Dz")]
NEON_VCMP))]
"TARGET_NEON_FP16INST"
{
char pattern[100];
sprintf (pattern, "vc<cmp_op_unsp>.f%%#<V_sz_elem>\t%%<V_reg>0,"
" %%<V_reg>1, %s",
which_alternative == 0
? "%<V_reg>2" : "#0");
output_asm_insn (pattern, operands);
return "";
}
[(set_attr "type" "neon_fp_compare_s<q>")])
(define_insn "neon_vc<cmp_op>u<mode>"
[(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
(neg:<V_cmp_result>
......@@ -2431,6 +2661,60 @@
[(set_attr "type" "neon_fp_compare_s<q>")]
)
(define_expand "neon_vca<cmp_op><mode>"
[(set
(match_operand:<V_cmp_result> 0 "s_register_operand")
(neg:<V_cmp_result>
(GLTE:<V_cmp_result>
(abs:VH (match_operand:VH 1 "s_register_operand"))
(abs:VH (match_operand:VH 2 "s_register_operand")))))]
"TARGET_NEON_FP16INST"
{
if (flag_unsafe_math_optimizations)
emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn
(operands[0], operands[1], operands[2]));
else
emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn_unspec
(operands[0], operands[1], operands[2]));
DONE;
})
(define_insn "neon_vca<cmp_op><mode>_fp16insn"
[(set
(match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
(neg:<V_cmp_result>
(GLTE:<V_cmp_result>
(abs:VH (match_operand:VH 1 "s_register_operand" "w"))
(abs:VH (match_operand:VH 2 "s_register_operand" "w")))))]
"TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
"vac<cmp_op>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_compare_s<q>")]
)
(define_insn "neon_vca<cmp_op_unsp><mode>_fp16insn_unspec"
[(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
(unspec:<V_cmp_result>
[(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")]
NEON_VAGLTE))]
"TARGET_NEON"
"vac<cmp_op_unsp>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_compare_s<q>")]
)
(define_expand "neon_vc<cmp_op>z<mode>"
[(set
(match_operand:<V_cmp_result> 0 "s_register_operand")
(COMPARISONS:<V_cmp_result>
(match_operand:VH 1 "s_register_operand")
(const_int 0)))]
"TARGET_NEON_FP16INST"
{
emit_insn (gen_neon_vc<cmp_op><mode> (operands[0], operands[1],
CONST0_RTX (<MODE>mode)));
DONE;
})
(define_insn "neon_vtst<mode>"
[(set (match_operand:VDQIW 0 "s_register_operand" "=w")
(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
......@@ -2451,6 +2735,16 @@
[(set_attr "type" "neon_abd<q>")]
)
(define_insn "neon_vabd<mode>"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")]
UNSPEC_VABD_F))]
"TARGET_NEON_FP16INST"
"vabd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_abd<q>")]
)
(define_insn "neon_vabdf<mode>"
[(set (match_operand:VCVTF 0 "s_register_operand" "=w")
(unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
......@@ -2513,6 +2807,40 @@
[(set_attr "type" "neon_fp_minmax_s<q>")]
)
(define_insn "neon_v<maxmin>f<mode>"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH
[(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")]
VMAXMINF))]
"TARGET_NEON_FP16INST"
"v<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_minmax_s<q>")]
)
(define_insn "neon_vp<maxmin>fv4hf"
[(set (match_operand:V4HF 0 "s_register_operand" "=w")
(unspec:V4HF
[(match_operand:V4HF 1 "s_register_operand" "w")
(match_operand:V4HF 2 "s_register_operand" "w")]
VPMAXMINF))]
"TARGET_NEON_FP16INST"
"vp<maxmin>.f16\t%P0, %P1, %P2"
[(set_attr "type" "neon_reduc_minmax")]
)
(define_insn "neon_<fmaxmin_op><mode>"
[(set
(match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH
[(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")]
VMAXMINFNM))]
"TARGET_NEON_FP16INST"
"<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_minmax_s<q>")]
)
;; Vector forms for the IEEE-754 fmax()/fmin() functions
(define_insn "<fmaxmin><mode>3"
[(set (match_operand:VCVTF 0 "s_register_operand" "=w")
......@@ -2584,6 +2912,17 @@
[(set_attr "type" "neon_fp_recps_s<q>")]
)
(define_insn "neon_vrecps<mode>"
[(set
(match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")]
UNSPEC_VRECPS))]
"TARGET_NEON_FP16INST"
"vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_recps_s<q>")]
)
(define_insn "neon_vrsqrts<mode>"
[(set (match_operand:VCVTF 0 "s_register_operand" "=w")
(unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
......@@ -2594,6 +2933,17 @@
[(set_attr "type" "neon_fp_rsqrts_s<q>")]
)
(define_insn "neon_vrsqrts<mode>"
[(set
(match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
(match_operand:VH 2 "s_register_operand" "w")]
UNSPEC_VRSQRTS))]
"TARGET_NEON_FP16INST"
"vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
[(set_attr "type" "neon_fp_rsqrts_s<q>")]
)
(define_expand "neon_vabs<mode>"
[(match_operand:VDQW 0 "s_register_operand" "")
(match_operand:VDQW 1 "s_register_operand" "")]
......@@ -2709,6 +3059,15 @@
})
(define_insn "neon_vrecpe<mode>"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")]
UNSPEC_VRECPE))]
"TARGET_NEON_FP16INST"
"vrecpe.f16\t%<V_reg>0, %<V_reg>1"
[(set_attr "type" "neon_fp_recpe_s<q>")]
)
(define_insn "neon_vrecpe<mode>"
[(set (match_operand:V32 0 "s_register_operand" "=w")
(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")]
UNSPEC_VRECPE))]
......@@ -3251,6 +3610,28 @@ if (BYTES_BIG_ENDIAN)
[(set_attr "type" "neon_fp_cvt_narrow_s_q")]
)
(define_insn "neon_vcvt<sup><mode>"
[(set
(match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
(unspec:<VH_CVTTO>
[(match_operand:VCVTHI 1 "s_register_operand" "w")]
VCVT_US))]
"TARGET_NEON_FP16INST"
"vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1"
[(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
)
(define_insn "neon_vcvt<sup><mode>"
[(set
(match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
(unspec:<VH_CVTTO>
[(match_operand:VH 1 "s_register_operand" "w")]
VCVT_US))]
"TARGET_NEON_FP16INST"
"vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
[(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
)
(define_insn "neon_vcvt<sup>_n<mode>"
[(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
......@@ -3265,6 +3646,20 @@ if (BYTES_BIG_ENDIAN)
)
(define_insn "neon_vcvt<sup>_n<mode>"
[(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
(unspec:<VH_CVTTO>
[(match_operand:VH 1 "s_register_operand" "w")
(match_operand:SI 2 "immediate_operand" "i")]
VCVT_US_N))]
"TARGET_NEON_FP16INST"
{
neon_const_bounds (operands[2], 0, 17);
return "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1, %2";
}
[(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
)
(define_insn "neon_vcvt<sup>_n<mode>"
[(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
(match_operand:SI 2 "immediate_operand" "i")]
......@@ -3277,6 +3672,31 @@ if (BYTES_BIG_ENDIAN)
[(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")]
)
(define_insn "neon_vcvt<sup>_n<mode>"
[(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
(unspec:<VH_CVTTO>
[(match_operand:VCVTHI 1 "s_register_operand" "w")
(match_operand:SI 2 "immediate_operand" "i")]
VCVT_US_N))]
"TARGET_NEON_FP16INST"
{
neon_const_bounds (operands[2], 0, 17);
return "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1, %2";
}
[(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
)
(define_insn "neon_vcvt<vcvth_op><sup><mode>"
[(set
(match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
(unspec:<VH_CVTTO>
[(match_operand:VH 1 "s_register_operand" "w")]
VCVT_HF_US))]
"TARGET_NEON_FP16INST"
"vcvt<vcvth_op>.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
[(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
)
(define_insn "neon_vmovn<mode>"
[(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
......@@ -3347,6 +3767,18 @@ if (BYTES_BIG_ENDIAN)
(const_string "neon_mul_<V_elem_ch>_scalar<q>")))]
)
(define_insn "neon_vmul_lane<mode>"
[(set (match_operand:VH 0 "s_register_operand" "=w")
(unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
(match_operand:V4HF 2 "s_register_operand"
"<scalar_mul_constraint>")
(match_operand:SI 3 "immediate_operand" "i")]
UNSPEC_VMUL_LANE))]
"TARGET_NEON_FP16INST"
"vmul.f16\t%<V_reg>0, %<V_reg>1, %P2[%c3]"
[(set_attr "type" "neon_fp_mul_s_scalar<q>")]
)
(define_insn "neon_vmull<sup>_lane<mode>"
[(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
......@@ -3601,6 +4033,19 @@ if (BYTES_BIG_ENDIAN)
DONE;
})
(define_expand "neon_vmul_n<mode>"
[(match_operand:VH 0 "s_register_operand")
(match_operand:VH 1 "s_register_operand")
(match_operand:<V_elem> 2 "s_register_operand")]
"TARGET_NEON_FP16INST"
{
rtx tmp = gen_reg_rtx (V4HFmode);
emit_insn (gen_neon_vset_lanev4hf (tmp, operands[2], tmp, const0_rtx));
emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
const0_rtx));
DONE;
})
(define_expand "neon_vmulls_n<mode>"
[(match_operand:<V_widen> 0 "s_register_operand" "")
(match_operand:VMDI 1 "s_register_operand" "")
......
......@@ -191,6 +191,8 @@
UNSPEC_VBSL
UNSPEC_VCAGE
UNSPEC_VCAGT
UNSPEC_VCALE
UNSPEC_VCALT
UNSPEC_VCEQ
UNSPEC_VCGE
UNSPEC_VCGEU
......@@ -258,6 +260,8 @@
UNSPEC_VMLSL_S_LANE
UNSPEC_VMLSL_U_LANE
UNSPEC_VMLSL_LANE
UNSPEC_VFMA_LANE
UNSPEC_VFMS_LANE
UNSPEC_VMOVL_S
UNSPEC_VMOVL_U
UNSPEC_VMOVN
......@@ -387,4 +391,3 @@
UNSPEC_VRNDP
UNSPEC_VRNDX
])
2016-09-23 Matthew Wahab <matthew.wahab@arm.com>
* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
options. Add tests for float16x4_t and float16x8_t.
2016-09-23 Dominik Vogt <vogt@linux.vnet.ibm.com>
* gcc.target/s390/risbg-ll-1.c: Ported risbg tests from llvm.
......
/* { dg-do compile } */
/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
/* { dg-options "-O2 -ffast-math" } */
/* { dg-add-options arm_v8_2a_fp16_scalar } */
/* { dg-add-options arm_v8_2a_fp16_neon } */
/* Test instructions generated for half-precision arithmetic. */
......@@ -9,6 +9,9 @@ typedef __fp16 float16_t;
typedef __simd64_float16_t float16x4_t;
typedef __simd128_float16_t float16x8_t;
typedef short int16x4_t __attribute__ ((vector_size (8)));
typedef short int int16x8_t __attribute__ ((vector_size (16)));
float16_t
fp16_abs (float16_t a)
{
......@@ -50,15 +53,49 @@ TEST_CMP (greaterthan, >, int, float16_t)
TEST_CMP (lessthanequal, <=, int, float16_t)
TEST_CMP (greaterthanqual, >=, int, float16_t)
/* Vectors of size 4. */
TEST_UNOP (neg, -, float16x4_t)
TEST_BINOP (add, +, float16x4_t)
TEST_BINOP (sub, -, float16x4_t)
TEST_BINOP (mult, *, float16x4_t)
TEST_BINOP (div, /, float16x4_t)
TEST_CMP (equal, ==, int16x4_t, float16x4_t)
TEST_CMP (unequal, !=, int16x4_t, float16x4_t)
TEST_CMP (lessthan, <, int16x4_t, float16x4_t)
TEST_CMP (greaterthan, >, int16x4_t, float16x4_t)
TEST_CMP (lessthanequal, <=, int16x4_t, float16x4_t)
TEST_CMP (greaterthanqual, >=, int16x4_t, float16x4_t)
/* Vectors of size 8. */
TEST_UNOP (neg, -, float16x8_t)
TEST_BINOP (add, +, float16x8_t)
TEST_BINOP (sub, -, float16x8_t)
TEST_BINOP (mult, *, float16x8_t)
TEST_BINOP (div, /, float16x8_t)
TEST_CMP (equal, ==, int16x8_t, float16x8_t)
TEST_CMP (unequal, !=, int16x8_t, float16x8_t)
TEST_CMP (lessthan, <, int16x8_t, float16x8_t)
TEST_CMP (greaterthan, >, int16x8_t, float16x8_t)
TEST_CMP (lessthanequal, <=, int16x8_t, float16x8_t)
TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } } */
/* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } } */
/* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } } */
/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } } */
/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } } */
/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } } */
/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } } */
/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } } */
/* { dg-final { scan-assembler-not {vadd\.f32} } } */
/* { dg-final { scan-assembler-not {vsub\.f32} } } */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment