Commit 8e25bae5 by Peter Bergner

rs6000: Add MMA built-in function definitions and test cases

Add the Matrix-Multiply Assist (MMA) built-ins.  The MMA accumulators are
INOUT operands for most MMA instructions, but they are also very expensive
to move around.  For this reason, we have implemented a built-in API where
the accumulators are passed using pass-by-reference/pointers, so the user
won't use one accumulator as input and another as output, which wouldentail
a lot of copies.  However, using pointers gives us poor code generation
when we expand the built-ins at normal expand time.  We therefore expand
the MMA built-ins early into gimple, converting the pass-by-reference calls
to an internal built-in that uses pass-by-value calling convention, where
we can enforce the input and output accumulators are the same.  This gives
us much better code generation.

2020-06-20  Peter Bergner  <bergner@linux.ibm.com>

gcc/
	* config/rs6000/predicates.md (mma_assemble_input_operand): New.
	* config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3,
	BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA
	built-in functions.
	(ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR,
	PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN,
	PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP,
	PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN,
	PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN,
	PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP,
	PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4,
	PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP,
	XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2,
	XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER,
	XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN,
	XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S,
	XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP,
	XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins.
	* config/rs6000/rs6000.c (rs6000_emit_move): Use CONST_INT_P.
	Allow zero constants.
	(print_operand) <case 'A'>: New output modifier.
	(rs6000_split_multireg_move): Add support for inserting accumulator
	priming and depriming instructions.  Add support for splitting an
	assemble accumulator pattern.
	* config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin,
	rs6000_gimple_fold_mma_builtin): New functions.
	(RS6000_BUILTIN_M): New macro.
	(def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes.
	(bdesc_mma): Add new MMA built-in support.
	(htm_expand_builtin): Use RS6000_BTC_OPND_MASK.
	(rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and
	RS6000_BTM_MMA.
	(rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute.
	(rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p
	and rs6000_gimple_fold_mma_builtin.
	(rs6000_expand_builtin): Call mma_expand_builtin.
	Use RS6000_BTC_OPND_MASK.
	(rs6000_init_builtins): Adjust comment.  Call mma_init_builtins.
	(htm_init_builtins): Use RS6000_BTC_OPND_MASK.
	(builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and
	VSX_BUILTIN_XVCVBF16SP.
	* config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY,
	RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR,
	RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines.
	(RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST,
	RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values.
	* config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant.
	(UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2,
	UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP,
	UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP,
	UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN,
	UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN,
	UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER,
	UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP,
	UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP,
	UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN,
	UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN,
	UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2,
	UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S,
	UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8,
	UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4,
	UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP,
	UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN,
	UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN,
	UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN,
	UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP,
	UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP,
	UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER,
	UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN,
	UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP,
	UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8,
	UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP,
	UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New.
	(MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8,
	MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4,
	MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4,
	MMA_AVVI4I4I4): New define_int_iterator.
	(acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2,
	avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4,
	avvi4i4i4): New define_int_attr.
	(*movpxi): Add zero constant alternative.
	(mma_assemble_pair, mma_assemble_acc): New define_expand.
	(*mma_assemble_acc): New define_insn_and_split.
	(mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
	mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
	mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
	mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn.
	* config/rs6000/rs6000.md (define_attr "type"): New type mma.
	* config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New.
	(UNSPEC_VSX_XVCVSPBF16): Likewise.
	(XVCVBF16): New define_int_iterator.
	(xvcvbf16): New define_int_attr.
	(vsx_<xvcvbf16>): New define_insn.
	* doc/extend.texi: Document the mma built-ins.

gcc/testsuite/
	* gcc.target/powerpc/mma-builtin-1.c: New test.
	* gcc.target/powerpc/mma-builtin-2.c: New test.
	* gcc.target/powerpc/mma-builtin-3.c: New test.
	* gcc.target/powerpc/mma-builtin-4.c: New test.
	* gcc.target/powerpc/mma-builtin-5.c: New test.
	* gcc.target/powerpc/mma-builtin-6.c: New test.

(cherry picked from commit 8ee2640bfdc62f835ec9740278f948034bc7d9f1)
parent 7e3896a4
......@@ -1119,6 +1119,11 @@
return gpc_reg_operand (op, mode);
})
;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
(define_special_predicate "mma_assemble_input_operand"
(match_test "(mode == V16QImode
&& (vsx_register_operand (op, mode) || MEM_P (op)))"))
;; Return true if operand is an operator used in rotate-and-mask instructions.
(define_predicate "rotate_mask_operator"
(match_code "rotate,ashift,lshiftrt"))
......
......@@ -31,6 +31,7 @@
RS6000_BUILTIN_A -- ABS builtins
RS6000_BUILTIN_D -- DST builtins
RS6000_BUILTIN_H -- HTM builtins
RS6000_BUILTIN_M -- MMA builtins
RS6000_BUILTIN_P -- Altivec, VSX, ISA 2.07 vector predicate builtins
RS6000_BUILTIN_X -- special builtins
......@@ -69,6 +70,10 @@
#error "RS6000_BUILTIN_H is not defined."
#endif
#ifndef RS6000_BUILTIN_M
#error "RS6000_BUILTIN_M is not defined."
#endif
#ifndef RS6000_BUILTIN_P
#error "RS6000_BUILTIN_P is not defined."
#endif
......@@ -324,6 +329,82 @@
| RS6000_BTC_SPECIAL), \
CODE_FOR_nothing) /* ICODE */
/* MMA convenience macros. */
#define BU_MMA_1(ENUM, NAME, ATTR, ICODE) \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
"__builtin_mma_" NAME, /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_UNARY \
| RS6000_BTC_VOID \
| RS6000_BTC_GIMPLE), \
CODE_FOR_nothing) /* ICODE */ \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
"__builtin_mma_" NAME "_internal", /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_UNARY), \
CODE_FOR_ ## ICODE) /* ICODE */
#define BU_MMA_V2(ENUM, NAME, ATTR, ICODE) \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
"__builtin_mma_" NAME, /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_BINARY \
| RS6000_BTC_VOID \
| RS6000_BTC_GIMPLE), \
CODE_FOR_nothing) /* ICODE */
#define BU_MMA_3(ENUM, NAME, ATTR, ICODE) \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
"__builtin_mma_" NAME, /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_TERNARY \
| RS6000_BTC_VOID \
| RS6000_BTC_GIMPLE), \
CODE_FOR_nothing) /* ICODE */ \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
"__builtin_mma_" NAME "_internal", /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_TERNARY), \
CODE_FOR_ ## ICODE) /* ICODE */
#define BU_MMA_5(ENUM, NAME, ATTR, ICODE) \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
"__builtin_mma_" NAME, /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_QUINARY \
| RS6000_BTC_VOID \
| RS6000_BTC_GIMPLE), \
CODE_FOR_nothing) /* ICODE */ \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
"__builtin_mma_" NAME "_internal", /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_QUINARY), \
CODE_FOR_ ## ICODE) /* ICODE */
#define BU_MMA_6(ENUM, NAME, ATTR, ICODE) \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM, /* ENUM */ \
"__builtin_mma_" NAME, /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_SENARY \
| RS6000_BTC_VOID \
| RS6000_BTC_GIMPLE), \
CODE_FOR_nothing) /* ICODE */ \
RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL, /* ENUM */ \
"__builtin_mma_" NAME "_internal", /* NAME */ \
RS6000_BTM_MMA, /* MASK */ \
(RS6000_BTC_ ## ATTR /* ATTR */ \
| RS6000_BTC_SENARY), \
CODE_FOR_ ## ICODE) /* ICODE */
/* ISA 2.05 (power6) convenience macros. */
/* For functions that depend on the CMPB instruction */
#define BU_P6_2(ENUM, NAME, ATTR, ICODE) \
......@@ -2611,3 +2692,77 @@ BU_SPECIAL_X (RS6000_BUILTIN_CPU_SUPPORTS, "__builtin_cpu_supports",
/* Darwin CfString builtin. */
BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", RS6000_BTM_ALWAYS,
RS6000_BTC_MISC)
/* FUTURE MMA builtins. */
BU_VSX_1 (XVCVBF16SP, "xvcvbf16sp", MISC, vsx_xvcvbf16sp)
BU_VSX_1 (XVCVSPBF16, "xvcvspbf16", MISC, vsx_xvcvspbf16)
BU_MMA_1 (XXMFACC, "xxmfacc", QUAD, mma_xxmfacc)
BU_MMA_1 (XXMTACC, "xxmtacc", QUAD, mma_xxmtacc)
BU_MMA_1 (XXSETACCZ, "xxsetaccz", MISC, mma_xxsetaccz)
BU_MMA_V2 (DISASSEMBLE_ACC, "disassemble_acc", QUAD, nothing)
BU_MMA_V2 (DISASSEMBLE_PAIR,"disassemble_pair", PAIR, nothing)
BU_MMA_3 (ASSEMBLE_PAIR, "assemble_pair", MISC, mma_assemble_pair)
BU_MMA_3 (XVBF16GER2, "xvbf16ger2", MISC, mma_xvbf16ger2)
BU_MMA_3 (XVF16GER2, "xvf16ger2", MISC, mma_xvf16ger2)
BU_MMA_3 (XVF32GER, "xvf32ger", MISC, mma_xvf32ger)
BU_MMA_3 (XVF64GER, "xvf64ger", PAIR, mma_xvf64ger)
BU_MMA_3 (XVI4GER8, "xvi4ger8", MISC, mma_xvi4ger8)
BU_MMA_3 (XVI8GER4, "xvi8ger4", MISC, mma_xvi8ger4)
BU_MMA_3 (XVI16GER2, "xvi16ger2", MISC, mma_xvi16ger2)
BU_MMA_3 (XVI16GER2S, "xvi16ger2s", MISC, mma_xvi16ger2s)
BU_MMA_3 (XVBF16GER2NN, "xvbf16ger2nn", QUAD, mma_xvbf16ger2nn)
BU_MMA_3 (XVBF16GER2NP, "xvbf16ger2np", QUAD, mma_xvbf16ger2np)
BU_MMA_3 (XVBF16GER2PN, "xvbf16ger2pn", QUAD, mma_xvbf16ger2pn)
BU_MMA_3 (XVBF16GER2PP, "xvbf16ger2pp", QUAD, mma_xvbf16ger2pp)
BU_MMA_3 (XVF16GER2NN, "xvf16ger2nn", QUAD, mma_xvf16ger2nn)
BU_MMA_3 (XVF16GER2NP, "xvf16ger2np", QUAD, mma_xvf16ger2np)
BU_MMA_3 (XVF16GER2PN, "xvf16ger2pn", QUAD, mma_xvf16ger2pn)
BU_MMA_3 (XVF16GER2PP, "xvf16ger2pp", QUAD, mma_xvf16ger2pp)
BU_MMA_3 (XVF32GERNN, "xvf32gernn", QUAD, mma_xvf32gernn)
BU_MMA_3 (XVF32GERNP, "xvf32gernp", QUAD, mma_xvf32gernp)
BU_MMA_3 (XVF32GERPN, "xvf32gerpn", QUAD, mma_xvf32gerpn)
BU_MMA_3 (XVF32GERPP, "xvf32gerpp", QUAD, mma_xvf32gerpp)
BU_MMA_3 (XVF64GERNN, "xvf64gernn", QUADPAIR, mma_xvf64gernn)
BU_MMA_3 (XVF64GERNP, "xvf64gernp", QUADPAIR, mma_xvf64gernp)
BU_MMA_3 (XVF64GERPN, "xvf64gerpn", QUADPAIR, mma_xvf64gerpn)
BU_MMA_3 (XVF64GERPP, "xvf64gerpp", QUADPAIR, mma_xvf64gerpp)
BU_MMA_3 (XVI4GER8PP, "xvi4ger8pp", QUAD, mma_xvi4ger8pp)
BU_MMA_3 (XVI8GER4PP, "xvi8ger4pp", QUAD, mma_xvi8ger4pp)
BU_MMA_3 (XVI8GER4SPP, "xvi8ger4spp", QUAD, mma_xvi8ger4spp)
BU_MMA_3 (XVI16GER2PP, "xvi16ger2pp", QUAD, mma_xvi16ger2pp)
BU_MMA_3 (XVI16GER2SPP, "xvi16ger2spp", QUAD, mma_xvi16ger2spp)
BU_MMA_5 (ASSEMBLE_ACC, "assemble_acc", MISC, mma_assemble_acc)
BU_MMA_5 (PMXVF32GER, "pmxvf32ger", MISC, mma_pmxvf32ger)
BU_MMA_5 (PMXVF64GER, "pmxvf64ger", PAIR, mma_pmxvf64ger)
BU_MMA_5 (PMXVF32GERNN, "pmxvf32gernn", QUAD, mma_pmxvf32gernn)
BU_MMA_5 (PMXVF32GERNP, "pmxvf32gernp", QUAD, mma_pmxvf32gernp)
BU_MMA_5 (PMXVF32GERPN, "pmxvf32gerpn", QUAD, mma_pmxvf32gerpn)
BU_MMA_5 (PMXVF32GERPP, "pmxvf32gerpp", QUAD, mma_pmxvf32gerpp)
BU_MMA_5 (PMXVF64GERNN, "pmxvf64gernn", QUADPAIR, mma_pmxvf64gernn)
BU_MMA_5 (PMXVF64GERNP, "pmxvf64gernp", QUADPAIR, mma_pmxvf64gernp)
BU_MMA_5 (PMXVF64GERPN, "pmxvf64gerpn", QUADPAIR, mma_pmxvf64gerpn)
BU_MMA_5 (PMXVF64GERPP, "pmxvf64gerpp", QUADPAIR, mma_pmxvf64gerpp)
BU_MMA_6 (PMXVBF16GER2, "pmxvbf16ger2", MISC, mma_pmxvbf16ger2)
BU_MMA_6 (PMXVF16GER2, "pmxvf16ger2", MISC, mma_pmxvf16ger2)
BU_MMA_6 (PMXVI4GER8, "pmxvi4ger8", MISC, mma_pmxvi4ger8)
BU_MMA_6 (PMXVI8GER4, "pmxvi8ger4", MISC, mma_pmxvi8ger4)
BU_MMA_6 (PMXVI16GER2, "pmxvi16ger2", MISC, mma_pmxvi16ger2)
BU_MMA_6 (PMXVI16GER2S, "pmxvi16ger2s", MISC, mma_pmxvi16ger2s)
BU_MMA_6 (PMXVBF16GER2NN, "pmxvbf16ger2nn", QUAD, mma_pmxvbf16ger2nn)
BU_MMA_6 (PMXVBF16GER2NP, "pmxvbf16ger2np", QUAD, mma_pmxvbf16ger2np)
BU_MMA_6 (PMXVBF16GER2PN, "pmxvbf16ger2pn", QUAD, mma_pmxvbf16ger2pn)
BU_MMA_6 (PMXVBF16GER2PP, "pmxvbf16ger2pp", QUAD, mma_pmxvbf16ger2pp)
BU_MMA_6 (PMXVF16GER2NN, "pmxvf16ger2nn", QUAD, mma_pmxvf16ger2nn)
BU_MMA_6 (PMXVF16GER2NP, "pmxvf16ger2np", QUAD, mma_pmxvf16ger2np)
BU_MMA_6 (PMXVF16GER2PN, "pmxvf16ger2pn", QUAD, mma_pmxvf16ger2pn)
BU_MMA_6 (PMXVF16GER2PP, "pmxvf16ger2pp", QUAD, mma_pmxvf16ger2pp)
BU_MMA_6 (PMXVI4GER8PP, "pmxvi4ger8pp", QUAD, mma_pmxvi4ger8pp)
BU_MMA_6 (PMXVI8GER4PP, "pmxvi8ger4pp", QUAD, mma_pmxvi8ger4pp)
BU_MMA_6 (PMXVI8GER4SPP, "pmxvi8ger4spp", QUAD, mma_pmxvi8ger4spp)
BU_MMA_6 (PMXVI16GER2PP, "pmxvi16ger2pp", QUAD, mma_pmxvi16ger2pp)
BU_MMA_6 (PMXVI16GER2SPP, "pmxvi16ger2spp", QUAD, mma_pmxvi16ger2spp)
......@@ -9935,7 +9935,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
case E_POImode:
case E_PXImode:
if (CONSTANT_P (operands[1]))
if (CONST_INT_P (operands[1]) && INTVAL (operands[1]) != 0)
error ("%qs is an opaque type, and you can't set it to other values.",
(mode == POImode) ? "__vector_pair" : "__vector_quad");
break;
......@@ -12847,6 +12847,14 @@ print_operand (FILE *file, rtx x, int code)
/* %c is output_addr_const if a CONSTANT_ADDRESS_P, otherwise
output_operand. */
case 'A':
/* Write the MMA accumulator number associated with VSX register X. */
if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
output_operand_lossage ("invalid %%A value");
else
fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
return;
case 'D':
/* Like 'J' but get to the GT bit only. */
if (!REG_P (x) || !CR_REGNO_P (REGNO (x)))
......@@ -15957,6 +15965,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
unsigned offset = 0;
unsigned size = GET_MODE_SIZE (reg_mode);
/* If we are reading an accumulator register, we have to
deprime it before we can access it. */
if (TARGET_MMA
&& GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
emit_insn (gen_mma_xxmfacc (src, src));
for (int i = 0; i < nregs; i++)
{
unsigned subreg = (WORDS_BIG_ENDIAN)
......@@ -15985,6 +15999,32 @@ rs6000_split_multireg_move (rtx dst, rtx src)
emit_insn (gen_rtx_SET (dst2, src2));
}
/* If we are writing an accumulator register, we have to
prime it after we've written it. */
if (TARGET_MMA
&& GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
emit_insn (gen_mma_xxmtacc (dst, dst));
return;
}
if (GET_CODE (src) == UNSPEC)
{
gcc_assert (REG_P (dst)
&& FP_REGNO_P (REGNO (dst))
&& XINT (src, 1) == UNSPEC_MMA_ASSEMBLE_ACC);
reg_mode = GET_MODE (XVECEXP (src, 0, 0));
for (int i = 0; i < XVECLEN (src, 0); i++)
{
rtx dst_i = gen_rtx_REG (reg_mode, reg + i);
emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
}
/* We are writing an accumulator register, so we have to
prime it after we've written it. */
emit_insn (gen_mma_xxmtacc (dst, dst));
return;
}
......@@ -15993,6 +16033,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
{
/* If we are reading an accumulator register, we have to
deprime it before we can access it. */
if (TARGET_MMA
&& GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
emit_insn (gen_mma_xxmfacc (src, src));
/* Move register range backwards, if we might have destructive
overlap. */
int i;
......@@ -16001,6 +16047,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
i * reg_mode_size),
simplify_gen_subreg (reg_mode, src, mode,
i * reg_mode_size)));
/* If we are writing an accumulator register, we have to
prime it after we've written it. */
if (TARGET_MMA
&& GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
emit_insn (gen_mma_xxmtacc (dst, dst));
}
else
{
......@@ -16133,6 +16185,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
}
/* If we are reading an accumulator register, we have to
deprime it before we can access it. */
if (TARGET_MMA && REG_P (src)
&& GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
emit_insn (gen_mma_xxmfacc (src, src));
for (i = 0; i < nregs; i++)
{
/* Calculate index to next subword. */
......@@ -16150,6 +16208,13 @@ rs6000_split_multireg_move (rtx dst, rtx src)
simplify_gen_subreg (reg_mode, src, mode,
j * reg_mode_size)));
}
/* If we are writing an accumulator register, we have to
prime it after we've written it. */
if (TARGET_MMA && REG_P (dst)
&& GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
emit_insn (gen_mma_xxmtacc (dst, dst));
if (restore_basereg != NULL_RTX)
emit_insn (restore_basereg);
}
......
......@@ -2251,15 +2251,24 @@ extern int frame_pointer_needed;
flags macros, but we've run out of bits, so we now map the options into new
settings used here. */
/* Builtin attributes. */
#define RS6000_BTC_SPECIAL 0x00000000 /* Special function. */
/* Builtin operand count. */
#define RS6000_BTC_UNARY 0x00000001 /* normal unary function. */
#define RS6000_BTC_BINARY 0x00000002 /* normal binary function. */
#define RS6000_BTC_TERNARY 0x00000003 /* normal ternary function. */
#define RS6000_BTC_PREDICATE 0x00000004 /* predicate function. */
#define RS6000_BTC_ABS 0x00000005 /* Altivec/VSX ABS function. */
#define RS6000_BTC_DST 0x00000007 /* Altivec DST function. */
#define RS6000_BTC_TYPE_MASK 0x0000000f /* Mask to isolate types */
#define RS6000_BTC_QUATERNARY 0x00000004 /* normal quaternary
function. */
#define RS6000_BTC_QUINARY 0x00000005 /* normal quinary function. */
#define RS6000_BTC_SENARY 0x00000006 /* normal senary function. */
#define RS6000_BTC_OPND_MASK 0x00000007 /* Mask to isolate operands. */
/* Builtin attributes. */
#define RS6000_BTC_SPECIAL 0x00000000 /* Special function. */
#define RS6000_BTC_PREDICATE 0x00000008 /* predicate function. */
#define RS6000_BTC_ABS 0x00000010 /* Altivec/VSX ABS
function. */
#define RS6000_BTC_DST 0x00000020 /* Altivec DST function. */
#define RS6000_BTC_TYPE_MASK 0x0000003f /* Mask to isolate types */
#define RS6000_BTC_MISC 0x00000000 /* No special attributes. */
#define RS6000_BTC_CONST 0x00000100 /* Neither uses, nor
......@@ -2268,13 +2277,18 @@ extern int frame_pointer_needed;
state/mem and does
not modify global state. */
#define RS6000_BTC_FP 0x00000400 /* depends on rounding mode. */
#define RS6000_BTC_ATTR_MASK 0x00000700 /* Mask of the attributes. */
#define RS6000_BTC_QUAD 0x00000800 /* Uses a register quad. */
#define RS6000_BTC_PAIR 0x00001000 /* Uses a register pair. */
#define RS6000_BTC_QUADPAIR 0x00001800 /* Uses a quad and a pair. */
#define RS6000_BTC_ATTR_MASK 0x00001f00 /* Mask of the attributes. */
/* Miscellaneous information. */
#define RS6000_BTC_SPR 0x01000000 /* function references SPRs. */
#define RS6000_BTC_VOID 0x02000000 /* function has no return value. */
#define RS6000_BTC_CR 0x04000000 /* function references a CR. */
#define RS6000_BTC_OVERLOADED 0x08000000 /* function is overloaded. */
#define RS6000_BTC_GIMPLE 0x10000000 /* function should be expanded
into gimple. */
#define RS6000_BTC_MISC_MASK 0x1f000000 /* Mask of the misc info. */
/* Convenience macros to document the instruction type. */
......@@ -2341,6 +2355,7 @@ extern int frame_pointer_needed;
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
......@@ -2351,6 +2366,7 @@ extern int frame_pointer_needed;
#define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
#define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
......@@ -2368,6 +2384,7 @@ enum rs6000_builtins
#undef RS6000_BUILTIN_A
#undef RS6000_BUILTIN_D
#undef RS6000_BUILTIN_H
#undef RS6000_BUILTIN_M
#undef RS6000_BUILTIN_P
#undef RS6000_BUILTIN_X
......
......@@ -198,7 +198,7 @@
vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,
vecfloat,vecfdiv,vecdouble,mffgpr,mftgpr,crypto,
veclogical,veccmpfx,vecexts,vecmove,
htm,htmsimple,dfp"
htm,htmsimple,dfp,mma"
(const_string "integer"))
;; What data size does this instruction work on?
......
......@@ -295,6 +295,8 @@
UNSPEC_VSX_DIVUD
UNSPEC_VSX_MULSD
UNSPEC_VSX_SIGN_EXTEND
UNSPEC_VSX_XVCVBF16SP
UNSPEC_VSX_XVCVSPBF16
UNSPEC_VSX_XVCVSPSXDS
UNSPEC_VSX_VSLO
UNSPEC_VSX_EXTRACT
......@@ -344,6 +346,12 @@
UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX
])
(define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16
UNSPEC_VSX_XVCVBF16SP])
(define_int_attr xvcvbf16 [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
(UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
;; VSX moves
;; The patterns for LE permuted loads and stores come before the general
......@@ -5644,3 +5652,10 @@
DONE;
})
(define_insn "vsx_<xvcvbf16>"
[(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
(unspec:V16QI [(match_operand:V16QI 1 "vsx_register_operand" "wa")]
XVCVBF16))]
"TARGET_FUTURE"
"<xvcvbf16> %x0,%x1"
[(set_attr "type" "vecfloat")])
......@@ -13754,6 +13754,7 @@ instructions, but allow the compiler to schedule those calls.
* PowerPC AltiVec/VSX Built-in Functions::
* PowerPC Hardware Transactional Memory Built-in Functions::
* PowerPC Atomic Memory Operation Functions::
* PowerPC Matrix-Multiply Assist Built-in Functions::
* RX Built-in Functions::
* S/390 System z Built-in Functions::
* SH Built-in Functions::
......@@ -20971,6 +20972,100 @@ void amo_stdat_smax (int64_t *, int64_t);
void amo_stdat_smin (int64_t *, int64_t);
@end smallexample
@node PowerPC Matrix-Multiply Assist Built-in Functions
@subsection PowerPC Matrix-Multiply Assist Built-in Functions
ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
GCC provides support for these instructions through the following built-in
functions which are enabled with the @code{-mmma} option. The vec_t type
below is defined to be a normal vector unsigned char type.  The uint2, uint4
and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
respectively.  The compiler will verify that they are constants and that
their values are within range. 
The built-in functions supported are:
@smallexample
void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t);
void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t);
void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4);
void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4);
void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4);
void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4);
void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4);
void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t);
void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t);
void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t);
void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t);
void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t);
void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
void __builtin_mma_xxmtacc (__vector_quad *);
void __builtin_mma_xxmfacc (__vector_quad *);
void __builtin_mma_xxsetaccz (__vector_quad *);
void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
void __builtin_mma_disassemble_acc (void *, __vector_quad *);
void __builtin_mma_assemble_pair (__vector_pair *, vec_t, vec_t);
void __builtin_mma_disassemble_pair (void *, __vector_pair *);
vec_t __builtin_vsx_xvcvspbf16 (vec_t);
vec_t __builtin_vsx_xvcvbf16sp (vec_t);
@end smallexample
@node RX Built-in Functions
@subsection RX Built-in Functions
GCC supports some of the RX instructions which cannot be expressed in
......
/* { dg-do compile } */
/* { dg-require-effective-target powerpc_future_ok } */
/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
typedef unsigned char vec_t __attribute__((vector_size(16)));
void
foo0 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_xvi4ger8 (&acc, vec0, vec1);
__builtin_mma_xvi4ger8pp (&acc, vec0, vec1);
dst[0] = acc;
}
void
foo1 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_xvi8ger4 (&acc, vec0, vec1);
__builtin_mma_xvi8ger4pp (&acc, vec0, vec1);
__builtin_mma_xvi8ger4spp(&acc, vec0, vec1);
dst[1] = acc;
}
void
foo2 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_xvi16ger2 (&acc, vec0, vec1);
__builtin_mma_xvi16ger2pp (&acc, vec0, vec1);
dst[2] = acc;
}
void
foo3 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_xvi16ger2s (&acc, vec0, vec1);
__builtin_mma_xvi16ger2spp (&acc, vec0, vec1);
dst[3] = acc;
}
void
foo4 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_xvf16ger2 (&acc, vec0, vec1);
__builtin_mma_xvf16ger2pp (&acc, vec0, vec1);
__builtin_mma_xvf16ger2pn (&acc, vec0, vec1);
dst[4] = acc;
}
void
foo4b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_xvf16ger2np (&acc, vec0, vec1);
__builtin_mma_xvf16ger2nn (&acc, vec0, vec1);
dst[4] = acc;
}
void
foo5 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_xvbf16ger2 (&acc, vec0, vec1);
__builtin_mma_xvbf16ger2pp (&acc, vec0, vec1);
__builtin_mma_xvbf16ger2pn (&acc, vec0, vec1);
dst[5] = acc;
}
void
foo5b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_xvbf16ger2np (&acc, vec0, vec1);
__builtin_mma_xvbf16ger2nn (&acc, vec0, vec1);
dst[5] = acc;
}
void
foo6 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_xvf32ger (&acc, vec0, vec1);
__builtin_mma_xvf32gerpp (&acc, vec0, vec1);
__builtin_mma_xvf32gerpn (&acc, vec0, vec1);
dst[6] = acc;
}
void
foo6b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_xvf32gernp (&acc, vec0, vec1);
__builtin_mma_xvf32gernn (&acc, vec0, vec1);
dst[6] = acc;
}
void
foo7 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_pmxvi4ger8 (&acc, vec0, vec1, 15, 15, 255);
__builtin_mma_pmxvi4ger8pp (&acc, vec0, vec1, 15, 15, 255);
dst[7] = acc;
}
void
foo8 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_pmxvi8ger4 (&acc, vec0, vec1, 15, 15, 15);
__builtin_mma_pmxvi8ger4pp (&acc, vec0, vec1, 15, 15, 15);
__builtin_mma_pmxvi8ger4spp(&acc, vec0, vec1, 15, 15, 15);
dst[8] = acc;
}
void
foo9 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_pmxvi16ger2 (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvi16ger2pp (&acc, vec0, vec1, 15, 15, 3);
dst[9] = acc;
}
void
foo10 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_pmxvi16ger2s (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvi16ger2spp (&acc, vec0, vec1, 15, 15, 3);
dst[10] = acc;
}
void
foo11 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_pmxvf16ger2 (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
dst[11] = acc;
}
void
foo11b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_pmxvf16ger2np (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
dst[11] = acc;
}
void
foo12 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_pmxvbf16ger2 (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvbf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvbf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
dst[12] = acc;
}
void
foo12b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_pmxvbf16ger2np (&acc, vec0, vec1, 15, 15, 3);
__builtin_mma_pmxvbf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
dst[12] = acc;
}
void
foo13 (__vector_quad *dst, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
__builtin_mma_pmxvf32ger (&acc, vec0, vec1, 15, 15);
__builtin_mma_pmxvf32gerpp (&acc, vec0, vec1, 15, 15);
__builtin_mma_pmxvf32gerpn (&acc, vec0, vec1, 15, 15);
dst[13] = acc;
}
void
foo13b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
{
__vector_quad acc;
vec_t vec0 = vec[0];
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_pmxvf32gernp (&acc, vec0, vec1, 15, 15);
__builtin_mma_pmxvf32gernn (&acc, vec0, vec1, 15, 15);
dst[13] = acc;
}
/* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
/* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
/* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
/* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvbf16ger2nn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvbf16ger2np\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvbf16ger2pn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvbf16ger2pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf16ger2\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf16ger2nn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf16ger2np\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf16ger2pn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf16ger2pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf32ger\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf32gernn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf32gernp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf32gerpn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf32gerpp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi16ger2\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi16ger2pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi16ger2s\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi16ger2spp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi4ger8\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi4ger8pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi8ger4\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi8ger4pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvi8ger4spp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvbf16ger2\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvbf16ger2nn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvbf16ger2np\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf16ger2\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf16ger2nn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf16ger2np\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf16ger2pn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf16ger2pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf32ger\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf32gernn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf32gernp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf32gerpn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf32gerpp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi16ger2\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi16ger2pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi16ger2s\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi16ger2spp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi4ger8\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi4ger8pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi8ger4\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi8ger4pp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvi8ger4spp\M} 1 } } */
/* { dg-do compile } */
/* { dg-require-effective-target powerpc_future_ok } */
/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
typedef unsigned char vec_t __attribute__((vector_size(16)));
void
foo0 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
{
__vector_quad acc;
__vector_pair vecp0 = *pvecp;
vec_t vec1 = vec[1];
__builtin_mma_xvf64ger (&acc, vecp0, vec1);
__builtin_mma_xvf64gerpp (&acc, vecp0, vec1);
__builtin_mma_xvf64gerpn (&acc, vecp0, vec1);
dst[0] = acc;
}
void
foo1 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
{
__vector_quad acc;
__vector_pair vecp0 = *pvecp;
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_xvf64gernp (&acc, vecp0, vec1);
__builtin_mma_xvf64gernn (&acc, vecp0, vec1);
dst[0] = acc;
}
void
foo2 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
{
__vector_quad acc;
__vector_pair vecp0 = *pvecp;
vec_t vec1 = vec[1];
__builtin_mma_pmxvf64ger (&acc, vecp0, vec1, 15, 3);
__builtin_mma_pmxvf64gerpp (&acc, vecp0, vec1, 15, 3);
__builtin_mma_pmxvf64gerpn (&acc, vecp0, vec1, 15, 3);
dst[1] = acc;
}
void
foo3 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
{
__vector_quad acc;
__vector_pair vecp0 = *pvecp;
vec_t vec1 = vec[1];
acc = src[0];
__builtin_mma_pmxvf64gernp (&acc, vecp0, vec1, 15, 3);
__builtin_mma_pmxvf64gernn (&acc, vecp0, vec1, 15, 3);
dst[1] = acc;
}
/* { dg-final { scan-assembler-times {\mxxmfacc\M} 4 } } */
/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
/* { dg-final { scan-assembler-times {\mlxvp\M} 8 } } */
/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */
/* { dg-final { scan-assembler-times {\mxvf64ger\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf64gerpp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf64gerpn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf64gernp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvf64gernn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf64ger\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf64gerpp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf64gerpn\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf64gernp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mpmxvf64gernn\M} 1 } } */
/* { dg-do compile } */
/* { dg-require-effective-target powerpc_future_ok } */
/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
void
foo0 (void)
{
__vector_quad acc;
asm ("#..." : "=d" (acc));
__builtin_mma_xxmtacc (&acc);
__builtin_mma_xxmfacc (&acc);
asm ("#..." :: "d" (acc));
}
typedef unsigned char vec_t __attribute__((vector_size(16)));
void
foo1 (vec_t *vec)
{
vec[1] = __builtin_vsx_xvcvspbf16 (vec[0]);
vec[3] = __builtin_vsx_xvcvbf16sp (vec[2]);
}
/* { dg-final { scan-assembler-times {\mxxmtacc\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
/* { dg-final { scan-assembler-not {\mstxvp\M} } } */
/* { dg-final { scan-assembler-times {\mxvcvspbf16\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxvcvbf16sp\M} 1 } } */
/* { dg-do compile } */
/* { dg-require-effective-target powerpc_future_ok } */
/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
typedef unsigned char vec_t __attribute__((vector_size(16)));
void
foo (__vector_pair *dst, vec_t *src)
{
__vector_pair pair;
__builtin_mma_assemble_pair (&pair, src[0], src[4]);
*dst = pair;
}
void
bar (vec_t *dst, __vector_pair *src)
{
vec_t res[2];
__builtin_mma_disassemble_pair (res, src);
dst[0] = res[0];
dst[4] = res[1];
}
/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
/* { dg-final { scan-assembler-times {\mlxvp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
/* { dg-final { scan-assembler-times {\mstxvp\M} 1 } } */
/* { dg-do compile } */
/* { dg-require-effective-target powerpc_future_ok } */
/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
typedef unsigned char vec_t __attribute__((vector_size(16)));
void
foo (__vector_quad *dst, vec_t *src)
{
__vector_quad acc;
__builtin_mma_assemble_acc (&acc, src[0], src[4], src[8], src[12]);
*dst = acc;
}
void
bar (vec_t *dst, __vector_quad *src)
{
vec_t res[4];
__builtin_mma_disassemble_acc (res, src);
dst[0] = res[0];
dst[4] = res[1];
dst[8] = res[2];
dst[12] = res[3];
}
/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
/* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
/* { dg-do compile } */
/* { dg-require-effective-target powerpc_future_ok } */
/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
void
foo (__vector_quad *dst)
{
__vector_quad acc;
__builtin_mma_xxsetaccz (&acc);
*dst = acc;
}
/* { dg-final { scan-assembler-not {\mlxv\M} } } */
/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
/* { dg-final { scan-assembler-not {\mxxmtacc\M} } } */
/* { dg-final { scan-assembler-times {\mxxsetaccz\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment