rs6000: Add MMA built-in function definitions and test cases

Add the Matrix-Multiply Assist (MMA) built-ins. The MMA accumulators are INOUT operands for most MMA instructions, but they are also very expensive to move around. For this reason, we have implemented a built-in API where the accumulators are passed using pass-by-reference/pointers, so the user won't use one accumulator as input and another as output, which wouldentail a lot of copies. However, using pointers gives us poor code generation when we expand the built-ins at normal expand time. We therefore expand the MMA built-ins early into gimple, converting the pass-by-reference calls to an internal built-in that uses pass-by-value calling convention, where we can enforce the input and output accumulators are the same. This gives us much better code generation. 2020-06-20 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/predicates.md (mma_assemble_input_operand): New. * config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3, BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA built-in functions. (ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR, PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN, PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP, PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN, PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN, PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP, PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4, PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP, XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2, XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER, XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN, XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S, XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP, XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins. * config/rs6000/rs6000.c (rs6000_emit_move): Use CONST_INT_P. Allow zero constants. (print_operand) <case 'A'>: New output modifier. (rs6000_split_multireg_move): Add support for inserting accumulator priming and depriming instructions. Add support for splitting an assemble accumulator pattern. * config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin, rs6000_gimple_fold_mma_builtin): New functions. (RS6000_BUILTIN_M): New macro. (def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes. (bdesc_mma): Add new MMA built-in support. (htm_expand_builtin): Use RS6000_BTC_OPND_MASK. (rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and RS6000_BTM_MMA. (rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute. (rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p and rs6000_gimple_fold_mma_builtin. (rs6000_expand_builtin): Call mma_expand_builtin. Use RS6000_BTC_OPND_MASK. (rs6000_init_builtins): Adjust comment. Call mma_init_builtins. (htm_init_builtins): Use RS6000_BTC_OPND_MASK. (builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and VSX_BUILTIN_XVCVBF16SP. * config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines. (RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values. * config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant. (UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2, UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP, UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP, UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN, UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN, UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER, UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP, UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP, UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN, UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN, UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2, UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S, UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8, UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4, UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP, UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN, UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN, UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN, UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP, UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP, UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER, UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN, UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP, UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8, UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP, UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New. (MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8, MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4, MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4, MMA_AVVI4I4I4): New define_int_iterator. (acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2, avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4, avvi4i4i4): New define_int_attr. (*movpxi): Add zero constant alternative. (mma_assemble_pair, mma_assemble_acc): New define_expand. (*mma_assemble_acc): New define_insn_and_split. (mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>, mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn. * config/rs6000/rs6000.md (define_attr "type"): New type mma. * config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New. (UNSPEC_VSX_XVCVSPBF16): Likewise. (XVCVBF16): New define_int_iterator. (xvcvbf16): New define_int_attr. (vsx_<xvcvbf16>): New define_insn. * doc/extend.texi: Document the mma built-ins. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-1.c: New test. * gcc.target/powerpc/mma-builtin-2.c: New test. * gcc.target/powerpc/mma-builtin-3.c: New test. * gcc.target/powerpc/mma-builtin-4.c: New test. * gcc.target/powerpc/mma-builtin-5.c: New test. * gcc.target/powerpc/mma-builtin-6.c: New test. (cherry picked from commit 8ee2640bfdc62f835ec9740278f948034bc7d9f1)

rs6000: Add MMA built-in function definitions and test cases
Add the Matrix-Multiply Assist (MMA) built-ins. The MMA accumulators are INOUT operands for most MMA instructions, but they are also very expensive to move around. For this reason, we have implemented a built-in API where the accumulators are passed using pass-by-reference/pointers, so the user won't use one accumulator as input and another as output, which wouldentail a lot of copies. However, using pointers gives us poor code generation when we expand the built-ins at normal expand time. We therefore expand the MMA built-ins early into gimple, converting the pass-by-reference calls to an internal built-in that uses pass-by-value calling convention, where we can enforce the input and output accumulators are the same. This gives us much better code generation. 2020-06-20 Peter Bergner <bergner@linux.ibm.com> gcc/ * config/rs6000/predicates.md (mma_assemble_input_operand): New. * config/rs6000/rs6000-builtin.def (BU_MMA_1, BU_MMA_V2, BU_MMA_3, BU_MMA_5, BU_MMA_6, BU_VSX_1): Add support macros for defining MMA built-in functions. (ASSEMBLE_ACC, ASSEMBLE_PAIR, DISASSEMBLE_ACC, DISASSEMBLE_PAIR, PMXVBF16GER2, PMXVBF16GER2NN, PMXVBF16GER2NP, PMXVBF16GER2PN, PMXVBF16GER2PP, PMXVF16GER2, PMXVF16GER2NN, PMXVF16GER2NP, PMXVF16GER2PN, PMXVF16GER2PP, PMXVF32GER, PMXVF32GERNN, PMXVF32GERNP, PMXVF32GERPN, PMXVF32GERPP, PMXVF64GER, PMXVF64GERNN, PMXVF64GERNP, PMXVF64GERPN, PMXVF64GERPP, PMXVI16GER2, PMXVI16GER2PP, PMXVI16GER2S, PMXVI16GER2SPP, PMXVI4GER8, PMXVI4GER8PP, PMXVI8GER4, PMXVI8GER4PP, PMXVI8GER4SPP, XVBF16GER2, XVBF16GER2NN, XVBF16GER2NP, XVBF16GER2PN, XVBF16GER2PP, XVCVBF16SP, XVCVSPBF16, XVF16GER2, XVF16GER2NN, XVF16GER2NP, XVF16GER2PN, XVF16GER2PP, XVF32GER, XVF32GERNN, XVF32GERNP, XVF32GERPN, XVF32GERPP, XVF64GER, XVF64GERNN, XVF64GERNP, XVF64GERPN, XVF64GERPP, XVI16GER2, XVI16GER2PP, XVI16GER2S, XVI16GER2SPP, XVI4GER8, XVI4GER8PP, XVI8GER4, XVI8GER4PP, XVI8GER4SPP, XXMFACC, XXMTACC, XXSETACCZ): Add MMA built-ins. * config/rs6000/rs6000.c (rs6000_emit_move): Use CONST_INT_P. Allow zero constants. (print_operand) <case 'A'>: New output modifier. (rs6000_split_multireg_move): Add support for inserting accumulator priming and depriming instructions. Add support for splitting an assemble accumulator pattern. * config/rs6000/rs6000-call.c (mma_init_builtins, mma_expand_builtin, rs6000_gimple_fold_mma_builtin): New functions. (RS6000_BUILTIN_M): New macro. (def_builtin): Handle RS6000_BTC_QUAD and RS6000_BTC_PAIR attributes. (bdesc_mma): Add new MMA built-in support. (htm_expand_builtin): Use RS6000_BTC_OPND_MASK. (rs6000_invalid_builtin): Add handling of RS6000_BTM_FUTURE and RS6000_BTM_MMA. (rs6000_builtin_valid_without_lhs): Handle RS6000_BTC_VOID attribute. (rs6000_gimple_fold_builtin): Call rs6000_builtin_is_supported_p and rs6000_gimple_fold_mma_builtin. (rs6000_expand_builtin): Call mma_expand_builtin. Use RS6000_BTC_OPND_MASK. (rs6000_init_builtins): Adjust comment. Call mma_init_builtins. (htm_init_builtins): Use RS6000_BTC_OPND_MASK. (builtin_function_type): Handle VSX_BUILTIN_XVCVSPBF16 and VSX_BUILTIN_XVCVBF16SP. * config/rs6000/rs6000.h (RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK, RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR, RS6000_BTC_GIMPLE): New defines. (RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_ATTR_MASK): Adjust values. * config/rs6000/mma.md (MAX_MMA_OPERANDS): New define_constant. (UNSPEC_MMA_ASSEMBLE_ACC, UNSPEC_MMA_PMXVBF16GER2, UNSPEC_MMA_PMXVBF16GER2NN, UNSPEC_MMA_PMXVBF16GER2NP, UNSPEC_MMA_PMXVBF16GER2PN, UNSPEC_MMA_PMXVBF16GER2PP, UNSPEC_MMA_PMXVF16GER2, UNSPEC_MMA_PMXVF16GER2NN, UNSPEC_MMA_PMXVF16GER2NP, UNSPEC_MMA_PMXVF16GER2PN, UNSPEC_MMA_PMXVF16GER2PP, UNSPEC_MMA_PMXVF32GER, UNSPEC_MMA_PMXVF32GERNN, UNSPEC_MMA_PMXVF32GERNP, UNSPEC_MMA_PMXVF32GERPN, UNSPEC_MMA_PMXVF32GERPP, UNSPEC_MMA_PMXVF64GER, UNSPEC_MMA_PMXVF64GERNN, UNSPEC_MMA_PMXVF64GERNP, UNSPEC_MMA_PMXVF64GERPN, UNSPEC_MMA_PMXVF64GERPP, UNSPEC_MMA_PMXVI16GER2, UNSPEC_MMA_PMXVI16GER2PP, UNSPEC_MMA_PMXVI16GER2S, UNSPEC_MMA_PMXVI16GER2SPP, UNSPEC_MMA_PMXVI4GER8, UNSPEC_MMA_PMXVI4GER8PP, UNSPEC_MMA_PMXVI8GER4, UNSPEC_MMA_PMXVI8GER4PP, UNSPEC_MMA_PMXVI8GER4SPP, UNSPEC_MMA_XVBF16GER2, UNSPEC_MMA_XVBF16GER2NN, UNSPEC_MMA_XVBF16GER2NP, UNSPEC_MMA_XVBF16GER2PN, UNSPEC_MMA_XVBF16GER2PP, UNSPEC_MMA_XVF16GER2, UNSPEC_MMA_XVF16GER2NN, UNSPEC_MMA_XVF16GER2NP, UNSPEC_MMA_XVF16GER2PN, UNSPEC_MMA_XVF16GER2PP, UNSPEC_MMA_XVF32GER, UNSPEC_MMA_XVF32GERNN, UNSPEC_MMA_XVF32GERNP, UNSPEC_MMA_XVF32GERPN, UNSPEC_MMA_XVF32GERPP, UNSPEC_MMA_XVF64GER, UNSPEC_MMA_XVF64GERNN, UNSPEC_MMA_XVF64GERNP, UNSPEC_MMA_XVF64GERPN, UNSPEC_MMA_XVF64GERPP, UNSPEC_MMA_XVI16GER2, UNSPEC_MMA_XVI16GER2PP, UNSPEC_MMA_XVI16GER2S, UNSPEC_MMA_XVI16GER2SPP, UNSPEC_MMA_XVI4GER8, UNSPEC_MMA_XVI4GER8PP, UNSPEC_MMA_XVI8GER4, UNSPEC_MMA_XVI8GER4PP, UNSPEC_MMA_XVI8GER4SPP, UNSPEC_MMA_XXMFACC, UNSPEC_MMA_XXMTACC): New. (MMA_ACC, MMA_VV, MMA_AVV, MMA_PV, MMA_APV, MMA_VVI4I4I8, MMA_AVVI4I4I8, MMA_VVI4I4I2, MMA_AVVI4I4I2, MMA_VVI4I4, MMA_AVVI4I4, MMA_PVI4I2, MMA_APVI4I2, MMA_VVI4I4I4, MMA_AVVI4I4I4): New define_int_iterator. (acc, vv, avv, pv, apv, vvi4i4i8, avvi4i4i8, vvi4i4i2, avvi4i4i2, vvi4i4, avvi4i4, pvi4i2, apvi4i2, vvi4i4i4, avvi4i4i4): New define_int_attr. (*movpxi): Add zero constant alternative. (mma_assemble_pair, mma_assemble_acc): New define_expand. (*mma_assemble_acc): New define_insn_and_split. (mma_<acc>, mma_xxsetaccz, mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>, mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>): New define_insn. * config/rs6000/rs6000.md (define_attr "type"): New type mma. * config/rs6000/vsx.md (UNSPEC_VSX_XVCVBF16SP): New. (UNSPEC_VSX_XVCVSPBF16): Likewise. (XVCVBF16): New define_int_iterator. (xvcvbf16): New define_int_attr. (vsx_<xvcvbf16>): New define_insn. * doc/extend.texi: Document the mma built-ins. gcc/testsuite/ * gcc.target/powerpc/mma-builtin-1.c: New test. * gcc.target/powerpc/mma-builtin-2.c: New test. * gcc.target/powerpc/mma-builtin-3.c: New test. * gcc.target/powerpc/mma-builtin-4.c: New test. * gcc.target/powerpc/mma-builtin-5.c: New test. * gcc.target/powerpc/mma-builtin-6.c: New test. (cherry picked from commit 8ee2640bfdc62f835ec9740278f948034bc7d9f1)
8e25bae5 · Peter Bergner · 7e3896a4 · 8e25bae5 · 8e25bae5 · 8e25bae5
Commit 8e25bae5 authored Jun 20, 2020 by Peter Bergner
15 changed files
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1119,6 +1119,11 @@
  return gpc_reg_operand (op, mode);
 })

+;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
+(define_special_predicate "mma_assemble_input_operand"
+  (match_test "(mode == V16QImode
+		&& (vsx_register_operand (op, mode) || MEM_P (op)))"))
+
 ;; Return true if operand is an operator used in rotate-and-mask instructions.
 (define_predicate "rotate_mask_operator"
  (match_code "rotate,ashift,lshiftrt"))

--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -31,6 +31,7 @@
   RS6000_BUILTIN_A -- ABS builtins
   RS6000_BUILTIN_D -- DST builtins
   RS6000_BUILTIN_H -- HTM builtins
+   RS6000_BUILTIN_M -- MMA builtins
   RS6000_BUILTIN_P -- Altivec, VSX, ISA 2.07 vector predicate builtins
   RS6000_BUILTIN_X -- special builtins

@@ -69,6 +70,10 @@
  #error "RS6000_BUILTIN_H is not defined."
 #endif

+#ifndef RS6000_BUILTIN_M
+  #error "RS6000_BUILTIN_M is not defined."
+#endif
+
 #ifndef RS6000_BUILTIN_P
  #error "RS6000_BUILTIN_P is not defined."
 #endif
@@ -324,6 +329,82 @@
 		     | RS6000_BTC_SPECIAL),				\
 		    CODE_FOR_nothing)			/* ICODE */

+/* MMA convenience macros.  */
+
+#define BU_MMA_1(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY					\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_UNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_V2(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_BINARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */
+
+#define BU_MMA_3(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_TERNARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_5(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_QUINARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_QUINARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
+#define BU_MMA_6(ENUM, NAME, ATTR, ICODE)				\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM,		/* ENUM */	\
+		    "__builtin_mma_" NAME,		/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_SENARY				\
+		     | RS6000_BTC_VOID					\
+		     | RS6000_BTC_GIMPLE),				\
+		    CODE_FOR_nothing)			/* ICODE */	\
+  RS6000_BUILTIN_M (MMA_BUILTIN_ ## ENUM ## _INTERNAL,	/* ENUM */	\
+		    "__builtin_mma_" NAME "_internal",	/* NAME */	\
+		    RS6000_BTM_MMA,			/* MASK */	\
+		    (RS6000_BTC_ ## ATTR		/* ATTR */	\
+		     | RS6000_BTC_SENARY),				\
+		    CODE_FOR_ ## ICODE)			/* ICODE */
+
 /* ISA 2.05 (power6) convenience macros. */
 /* For functions that depend on the CMPB instruction */
 #define BU_P6_2(ENUM, NAME, ATTR, ICODE)				\
@@ -2611,3 +2692,77 @@ BU_SPECIAL_X (RS6000_BUILTIN_CPU_SUPPORTS, "__builtin_cpu_supports",
 /* Darwin CfString builtin.  */
 BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", RS6000_BTM_ALWAYS,
 	      RS6000_BTC_MISC)
+
+/* FUTURE MMA builtins.  */
+BU_VSX_1 (XVCVBF16SP,	    "xvcvbf16sp",	MISC, vsx_xvcvbf16sp)
+BU_VSX_1 (XVCVSPBF16,	    "xvcvspbf16",	MISC, vsx_xvcvspbf16)
+
+BU_MMA_1 (XXMFACC,	    "xxmfacc",		QUAD, mma_xxmfacc)
+BU_MMA_1 (XXMTACC,	    "xxmtacc",		QUAD, mma_xxmtacc)
+BU_MMA_1 (XXSETACCZ,	    "xxsetaccz",	MISC, mma_xxsetaccz)
+
+BU_MMA_V2 (DISASSEMBLE_ACC, "disassemble_acc",  QUAD, nothing)
+BU_MMA_V2 (DISASSEMBLE_PAIR,"disassemble_pair", PAIR, nothing)
+
+BU_MMA_3 (ASSEMBLE_PAIR,    "assemble_pair",	MISC, mma_assemble_pair)
+BU_MMA_3 (XVBF16GER2,	    "xvbf16ger2",	MISC, mma_xvbf16ger2)
+BU_MMA_3 (XVF16GER2,	    "xvf16ger2",	MISC, mma_xvf16ger2)
+BU_MMA_3 (XVF32GER,	    "xvf32ger",		MISC, mma_xvf32ger)
+BU_MMA_3 (XVF64GER,	    "xvf64ger",		PAIR, mma_xvf64ger)
+BU_MMA_3 (XVI4GER8,	    "xvi4ger8",		MISC, mma_xvi4ger8)
+BU_MMA_3 (XVI8GER4,	    "xvi8ger4",		MISC, mma_xvi8ger4)
+BU_MMA_3 (XVI16GER2,	    "xvi16ger2",	MISC, mma_xvi16ger2)
+BU_MMA_3 (XVI16GER2S,	    "xvi16ger2s",	MISC, mma_xvi16ger2s)
+BU_MMA_3 (XVBF16GER2NN,	    "xvbf16ger2nn",     QUAD, mma_xvbf16ger2nn)
+BU_MMA_3 (XVBF16GER2NP,	    "xvbf16ger2np",     QUAD, mma_xvbf16ger2np)
+BU_MMA_3 (XVBF16GER2PN,	    "xvbf16ger2pn",     QUAD, mma_xvbf16ger2pn)
+BU_MMA_3 (XVBF16GER2PP,	    "xvbf16ger2pp",     QUAD, mma_xvbf16ger2pp)
+BU_MMA_3 (XVF16GER2NN,	    "xvf16ger2nn",      QUAD, mma_xvf16ger2nn)
+BU_MMA_3 (XVF16GER2NP,	    "xvf16ger2np",      QUAD, mma_xvf16ger2np)
+BU_MMA_3 (XVF16GER2PN,	    "xvf16ger2pn",      QUAD, mma_xvf16ger2pn)
+BU_MMA_3 (XVF16GER2PP,	    "xvf16ger2pp",      QUAD, mma_xvf16ger2pp)
+BU_MMA_3 (XVF32GERNN,	    "xvf32gernn",       QUAD, mma_xvf32gernn)
+BU_MMA_3 (XVF32GERNP,	    "xvf32gernp",       QUAD, mma_xvf32gernp)
+BU_MMA_3 (XVF32GERPN,	    "xvf32gerpn",       QUAD, mma_xvf32gerpn)
+BU_MMA_3 (XVF32GERPP,	    "xvf32gerpp",       QUAD, mma_xvf32gerpp)
+BU_MMA_3 (XVF64GERNN,	    "xvf64gernn",       QUADPAIR, mma_xvf64gernn)
+BU_MMA_3 (XVF64GERNP,	    "xvf64gernp",       QUADPAIR, mma_xvf64gernp)
+BU_MMA_3 (XVF64GERPN,	    "xvf64gerpn",       QUADPAIR, mma_xvf64gerpn)
+BU_MMA_3 (XVF64GERPP,	    "xvf64gerpp",       QUADPAIR, mma_xvf64gerpp)
+BU_MMA_3 (XVI4GER8PP,	    "xvi4ger8pp",	QUAD, mma_xvi4ger8pp)
+BU_MMA_3 (XVI8GER4PP,	    "xvi8ger4pp",       QUAD, mma_xvi8ger4pp)
+BU_MMA_3 (XVI8GER4SPP,	    "xvi8ger4spp",      QUAD, mma_xvi8ger4spp)
+BU_MMA_3 (XVI16GER2PP,	    "xvi16ger2pp",      QUAD, mma_xvi16ger2pp)
+BU_MMA_3 (XVI16GER2SPP,	    "xvi16ger2spp",     QUAD, mma_xvi16ger2spp)
+
+BU_MMA_5 (ASSEMBLE_ACC,     "assemble_acc",	MISC, mma_assemble_acc)
+BU_MMA_5 (PMXVF32GER,	    "pmxvf32ger",       MISC, mma_pmxvf32ger)
+BU_MMA_5 (PMXVF64GER,	    "pmxvf64ger",       PAIR, mma_pmxvf64ger)
+BU_MMA_5 (PMXVF32GERNN,	    "pmxvf32gernn",     QUAD, mma_pmxvf32gernn)
+BU_MMA_5 (PMXVF32GERNP,	    "pmxvf32gernp",     QUAD, mma_pmxvf32gernp)
+BU_MMA_5 (PMXVF32GERPN,	    "pmxvf32gerpn",     QUAD, mma_pmxvf32gerpn)
+BU_MMA_5 (PMXVF32GERPP,	    "pmxvf32gerpp",     QUAD, mma_pmxvf32gerpp)
+BU_MMA_5 (PMXVF64GERNN,	    "pmxvf64gernn",     QUADPAIR, mma_pmxvf64gernn)
+BU_MMA_5 (PMXVF64GERNP,	    "pmxvf64gernp",     QUADPAIR, mma_pmxvf64gernp)
+BU_MMA_5 (PMXVF64GERPN,	    "pmxvf64gerpn",     QUADPAIR, mma_pmxvf64gerpn)
+BU_MMA_5 (PMXVF64GERPP,	    "pmxvf64gerpp",     QUADPAIR, mma_pmxvf64gerpp)
+
+BU_MMA_6 (PMXVBF16GER2,	    "pmxvbf16ger2",     MISC, mma_pmxvbf16ger2)
+BU_MMA_6 (PMXVF16GER2,	    "pmxvf16ger2",      MISC, mma_pmxvf16ger2)
+BU_MMA_6 (PMXVI4GER8,	    "pmxvi4ger8",       MISC, mma_pmxvi4ger8)
+BU_MMA_6 (PMXVI8GER4,	    "pmxvi8ger4",	MISC, mma_pmxvi8ger4)
+BU_MMA_6 (PMXVI16GER2,	    "pmxvi16ger2",      MISC, mma_pmxvi16ger2)
+BU_MMA_6 (PMXVI16GER2S,	    "pmxvi16ger2s",     MISC, mma_pmxvi16ger2s)
+BU_MMA_6 (PMXVBF16GER2NN,   "pmxvbf16ger2nn",   QUAD, mma_pmxvbf16ger2nn)
+BU_MMA_6 (PMXVBF16GER2NP,   "pmxvbf16ger2np",   QUAD, mma_pmxvbf16ger2np)
+BU_MMA_6 (PMXVBF16GER2PN,   "pmxvbf16ger2pn",   QUAD, mma_pmxvbf16ger2pn)
+BU_MMA_6 (PMXVBF16GER2PP,   "pmxvbf16ger2pp",   QUAD, mma_pmxvbf16ger2pp)
+BU_MMA_6 (PMXVF16GER2NN,    "pmxvf16ger2nn",    QUAD, mma_pmxvf16ger2nn)
+BU_MMA_6 (PMXVF16GER2NP,    "pmxvf16ger2np",    QUAD, mma_pmxvf16ger2np)
+BU_MMA_6 (PMXVF16GER2PN,    "pmxvf16ger2pn",    QUAD, mma_pmxvf16ger2pn)
+BU_MMA_6 (PMXVF16GER2PP,    "pmxvf16ger2pp",    QUAD, mma_pmxvf16ger2pp)
+BU_MMA_6 (PMXVI4GER8PP,	    "pmxvi4ger8pp",     QUAD, mma_pmxvi4ger8pp)
+BU_MMA_6 (PMXVI8GER4PP,	    "pmxvi8ger4pp",	QUAD, mma_pmxvi8ger4pp)
+BU_MMA_6 (PMXVI8GER4SPP,    "pmxvi8ger4spp",	QUAD, mma_pmxvi8ger4spp)
+BU_MMA_6 (PMXVI16GER2PP,    "pmxvi16ger2pp",    QUAD, mma_pmxvi16ger2pp)
+BU_MMA_6 (PMXVI16GER2SPP,   "pmxvi16ger2spp",   QUAD, mma_pmxvi16ger2spp)
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9935,7 +9935,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)

    case E_POImode:
    case E_PXImode:
-      if (CONSTANT_P (operands[1]))
+      if (CONST_INT_P (operands[1]) && INTVAL (operands[1]) != 0)
 	error ("%qs is an opaque type, and you can't set it to other values.",
 	       (mode == POImode) ? "__vector_pair" : "__vector_quad");
      break;
@@ -12847,6 +12847,14 @@ print_operand (FILE *file, rtx x, int code)
      /* %c is output_addr_const if a CONSTANT_ADDRESS_P, otherwise
 	 output_operand.  */

+    case 'A':
+      /* Write the MMA accumulator number associated with VSX register X.  */
+      if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
+	output_operand_lossage ("invalid %%A value");
+      else
+	fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
+      return;
+
    case 'D':
      /* Like 'J' but get to the GT bit only.  */
      if (!REG_P (x) || !CR_REGNO_P (REGNO (x)))
@@ -15957,6 +15965,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	  unsigned offset = 0;
 	  unsigned size = GET_MODE_SIZE (reg_mode);

+	  /* If we are reading an accumulator register, we have to
+	     deprime it before we can access it.  */
+	  if (TARGET_MMA
+	      && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	    emit_insn (gen_mma_xxmfacc (src, src));
+
 	  for (int i = 0; i < nregs; i++)
 	    {
 	      unsigned subreg = (WORDS_BIG_ENDIAN)
@@ -15985,6 +15999,32 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_rtx_SET (dst2, src2));
 	    }

+	  /* If we are writing an accumulator register, we have to
+	     prime it after we've written it.  */
+	  if (TARGET_MMA
+	      && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	    emit_insn (gen_mma_xxmtacc (dst, dst));
+
+	  return;
+	}
+
+      if (GET_CODE (src) == UNSPEC)
+	{
+	  gcc_assert (REG_P (dst)
+		      && FP_REGNO_P (REGNO (dst))
+		      && XINT (src, 1) == UNSPEC_MMA_ASSEMBLE_ACC);
+
+	  reg_mode = GET_MODE (XVECEXP (src, 0, 0));
+	  for (int i = 0; i < XVECLEN (src, 0); i++)
+	    {
+	      rtx dst_i = gen_rtx_REG (reg_mode, reg + i);
+	      emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
+	    }
+
+	  /* We are writing an accumulator register, so we have to
+	     prime it after we've written it.  */
+	  emit_insn (gen_mma_xxmtacc (dst, dst));
+
 	  return;
 	}

@@ -15993,6 +16033,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)

  if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
    {
+      /* If we are reading an accumulator register, we have to
+	 deprime it before we can access it.  */
+      if (TARGET_MMA
+	  && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	emit_insn (gen_mma_xxmfacc (src, src));
+
      /* Move register range backwards, if we might have destructive
 	 overlap.  */
      int i;
@@ -16001,6 +16047,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 						     i * reg_mode_size),
 				simplify_gen_subreg (reg_mode, src, mode,
 						     i * reg_mode_size)));
+
+      /* If we are writing an accumulator register, we have to
+	 prime it after we've written it.  */
+      if (TARGET_MMA
+	  && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	emit_insn (gen_mma_xxmtacc (dst, dst));
    }
  else
    {
@@ -16133,6 +16185,12 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	    gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
 	}

+      /* If we are reading an accumulator register, we have to
+	 deprime it before we can access it.  */
+      if (TARGET_MMA && REG_P (src)
+	  && GET_MODE (src) == PXImode && FP_REGNO_P (REGNO (src)))
+	emit_insn (gen_mma_xxmfacc (src, src));
+
      for (i = 0; i < nregs; i++)
 	{
 	  /* Calculate index to next subword.  */
@@ -16150,6 +16208,13 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 				  simplify_gen_subreg (reg_mode, src, mode,
 						       j * reg_mode_size)));
 	}
+
+      /* If we are writing an accumulator register, we have to
+	 prime it after we've written it.  */
+      if (TARGET_MMA && REG_P (dst)
+	  && GET_MODE (dst) == PXImode && FP_REGNO_P (REGNO (dst)))
+	emit_insn (gen_mma_xxmtacc (dst, dst));
+
      if (restore_basereg != NULL_RTX)
 	emit_insn (restore_basereg);
    }

--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2251,15 +2251,24 @@ extern int frame_pointer_needed;
   flags macros, but we've run out of bits, so we now map the options into new
   settings used here.  */

-/* Builtin attributes.  */
-#define RS6000_BTC_SPECIAL	0x00000000	/* Special function.  */
+/* Builtin operand count.  */
 #define RS6000_BTC_UNARY	0x00000001	/* normal unary function.  */
 #define RS6000_BTC_BINARY	0x00000002	/* normal binary function.  */
 #define RS6000_BTC_TERNARY	0x00000003	/* normal ternary function.  */
-#define RS6000_BTC_PREDICATE	0x00000004	/* predicate function.  */
-#define RS6000_BTC_ABS		0x00000005	/* Altivec/VSX ABS function.  */
-#define RS6000_BTC_DST		0x00000007	/* Altivec DST function.  */
-#define RS6000_BTC_TYPE_MASK	0x0000000f	/* Mask to isolate types */
+#define RS6000_BTC_QUATERNARY	0x00000004	/* normal quaternary
+						   function. */
+#define RS6000_BTC_QUINARY	0x00000005	/* normal quinary function.  */
+#define RS6000_BTC_SENARY	0x00000006	/* normal senary function.  */
+#define RS6000_BTC_OPND_MASK	0x00000007	/* Mask to isolate operands. */
+
+/* Builtin attributes.  */
+#define RS6000_BTC_SPECIAL	0x00000000	/* Special function.  */
+#define RS6000_BTC_PREDICATE	0x00000008	/* predicate function.  */
+#define RS6000_BTC_ABS		0x00000010	/* Altivec/VSX ABS
+						   function.  */
+#define RS6000_BTC_DST		0x00000020	/* Altivec DST function.  */
+
+#define RS6000_BTC_TYPE_MASK	0x0000003f	/* Mask to isolate types */

 #define RS6000_BTC_MISC		0x00000000	/* No special attributes.  */
 #define RS6000_BTC_CONST	0x00000100	/* Neither uses, nor
@@ -2268,13 +2277,18 @@ extern int frame_pointer_needed;
 						   state/mem and does
 						   not modify global state.  */
 #define RS6000_BTC_FP		0x00000400	/* depends on rounding mode.  */
-#define RS6000_BTC_ATTR_MASK	0x00000700	/* Mask of the attributes.  */
+#define RS6000_BTC_QUAD		0x00000800	/* Uses a register quad.  */
+#define RS6000_BTC_PAIR		0x00001000	/* Uses a register pair.  */
+#define RS6000_BTC_QUADPAIR	0x00001800	/* Uses a quad and a pair.  */
+#define RS6000_BTC_ATTR_MASK	0x00001f00	/* Mask of the attributes.  */

 /* Miscellaneous information.  */
 #define RS6000_BTC_SPR		0x01000000	/* function references SPRs.  */
 #define RS6000_BTC_VOID		0x02000000	/* function has no return value.  */
 #define RS6000_BTC_CR		0x04000000	/* function references a CR.  */
 #define RS6000_BTC_OVERLOADED	0x08000000	/* function is overloaded.  */
+#define RS6000_BTC_GIMPLE	0x10000000	/* function should be expanded
+						   into gimple.  */
 #define RS6000_BTC_MISC_MASK	0x1f000000	/* Mask of the misc info.  */

 /* Convenience macros to document the instruction type.  */
@@ -2341,6 +2355,7 @@ extern int frame_pointer_needed;
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X

@@ -2351,6 +2366,7 @@ extern int frame_pointer_needed;
 #define RS6000_BUILTIN_A(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_D(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_H(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
+#define RS6000_BUILTIN_M(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_P(ENUM, NAME, MASK, ATTR, ICODE) ENUM,
 #define RS6000_BUILTIN_X(ENUM, NAME, MASK, ATTR, ICODE) ENUM,

@@ -2368,6 +2384,7 @@ enum rs6000_builtins
 #undef RS6000_BUILTIN_A
 #undef RS6000_BUILTIN_D
 #undef RS6000_BUILTIN_H
+#undef RS6000_BUILTIN_M
 #undef RS6000_BUILTIN_P
 #undef RS6000_BUILTIN_X


--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -198,7 +198,7 @@
   vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,
   vecfloat,vecfdiv,vecdouble,mffgpr,mftgpr,crypto,
   veclogical,veccmpfx,vecexts,vecmove,
-   htm,htmsimple,dfp"
+   htm,htmsimple,dfp,mma"
  (const_string "integer"))

 ;; What data size does this instruction work on?

--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -295,6 +295,8 @@
   UNSPEC_VSX_DIVUD
   UNSPEC_VSX_MULSD
   UNSPEC_VSX_SIGN_EXTEND
+   UNSPEC_VSX_XVCVBF16SP
+   UNSPEC_VSX_XVCVSPBF16
   UNSPEC_VSX_XVCVSPSXDS
   UNSPEC_VSX_VSLO
   UNSPEC_VSX_EXTRACT
@@ -344,6 +346,12 @@
   UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX
  ])

+(define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
+				 UNSPEC_VSX_XVCVBF16SP])
+
+(define_int_attr xvcvbf16       [(UNSPEC_VSX_XVCVSPBF16 "xvcvspbf16")
+				 (UNSPEC_VSX_XVCVBF16SP "xvcvbf16sp")])
+
 ;; VSX moves

 ;; The patterns for LE permuted loads and stores come before the general
@@ -5644,3 +5652,10 @@
  DONE;
 })

+(define_insn "vsx_<xvcvbf16>"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+	(unspec:V16QI [(match_operand:V16QI 1 "vsx_register_operand" "wa")]
+		      XVCVBF16))]
+  "TARGET_FUTURE"
+  "<xvcvbf16> %x0,%x1"
+  [(set_attr "type" "vecfloat")])
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13754,6 +13754,7 @@ instructions, but allow the compiler to schedule those calls.
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
 * PowerPC Atomic Memory Operation Functions::
+* PowerPC Matrix-Multiply Assist Built-in Functions::
 * RX Built-in Functions::
 * S/390 System z Built-in Functions::
 * SH Built-in Functions::
@@ -20971,6 +20972,100 @@ void amo_stdat_smax (int64_t *, int64_t);
 void amo_stdat_smin (int64_t *, int64_t);
 @end smallexample

+@node PowerPC Matrix-Multiply Assist Built-in Functions
+@subsection PowerPC Matrix-Multiply Assist Built-in Functions
+ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
+GCC provides support for these instructions through the following built-in
+functions which are enabled with the @code{-mmma} option.  The vec_t type
+below is defined to be a normal vector unsigned char type.  The uint2, uint4
+and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
+respectively.  The compiler will verify that they are constants and that
+their values are within range. 
+
+The built-in functions supported are:
+
+@smallexample
+void __builtin_mma_xvi4ger8 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2s (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2 (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32ger (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_xvi4ger8pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi8ger4spp(__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvi16ger2spp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2pn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2np (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvbf16ger2nn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gerpn (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernp (__vector_quad *, vec_t, vec_t);
+void __builtin_mma_xvf32gernn (__vector_quad *, vec_t, vec_t);
+
+void __builtin_mma_pmxvi4ger8 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+void __builtin_mma_pmxvi4ger8pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint8);
+
+void __builtin_mma_pmxvi8ger4 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+void __builtin_mma_pmxvi8ger4spp(__vector_quad *, vec_t, vec_t, uint4, uint4, uint4);
+
+void __builtin_mma_pmxvi16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2s (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2 (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvi16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvi16ger2spp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pp (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2pn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2np (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+void __builtin_mma_pmxvbf16ger2nn (__vector_quad *, vec_t, vec_t, uint4, uint4, uint2);
+
+void __builtin_mma_pmxvf32ger (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gerpn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernp (__vector_quad *, vec_t, vec_t, uint4, uint4);
+void __builtin_mma_pmxvf32gernn (__vector_quad *, vec_t, vec_t, uint4, uint4);
+
+void __builtin_mma_xvf64ger (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gerpn (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernp (__vector_quad *, __vector_pair, vec_t);
+void __builtin_mma_xvf64gernn (__vector_quad *, __vector_pair, vec_t);
+
+void __builtin_mma_pmxvf64ger (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gerpn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernp (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+void __builtin_mma_pmxvf64gernn (__vector_quad *, __vector_pair, vec_t, uint4, uint2);
+
+void __builtin_mma_xxmtacc (__vector_quad *);
+void __builtin_mma_xxmfacc (__vector_quad *);
+void __builtin_mma_xxsetaccz (__vector_quad *);
+
+void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
+void __builtin_mma_disassemble_acc (void *, __vector_quad *);
+
+void __builtin_mma_assemble_pair (__vector_pair *, vec_t, vec_t);
+void __builtin_mma_disassemble_pair (void *, __vector_pair *);
+
+vec_t __builtin_vsx_xvcvspbf16 (vec_t);
+vec_t __builtin_vsx_xvcvbf16sp (vec_t);
+@end smallexample
+
 @node RX Built-in Functions
 @subsection RX Built-in Functions
 GCC supports some of the RX instructions which cannot be expressed in

--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo0 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi4ger8 (&acc, vec0, vec1);
+  __builtin_mma_xvi4ger8pp (&acc, vec0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo1 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi8ger4 (&acc, vec0, vec1);
+  __builtin_mma_xvi8ger4pp (&acc, vec0, vec1);
+  __builtin_mma_xvi8ger4spp(&acc, vec0, vec1);
+  dst[1] = acc;
+}
+
+void
+foo2 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvi16ger2pp (&acc, vec0, vec1);
+  dst[2] = acc;
+}
+
+void
+foo3 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvi16ger2s (&acc, vec0, vec1);
+  __builtin_mma_xvi16ger2spp (&acc, vec0, vec1);
+  dst[3] = acc;
+}
+
+void
+foo4 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2pp (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2pn (&acc, vec0, vec1);
+  dst[4] = acc;
+}
+
+void
+foo4b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf16ger2np (&acc, vec0, vec1);
+  __builtin_mma_xvf16ger2nn (&acc, vec0, vec1);
+  dst[4] = acc;
+}
+
+void
+foo5 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvbf16ger2 (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2pp (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2pn (&acc, vec0, vec1);
+  dst[5] = acc;
+}
+
+void
+foo5b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvbf16ger2np (&acc, vec0, vec1);
+  __builtin_mma_xvbf16ger2nn (&acc, vec0, vec1);
+  dst[5] = acc;
+}
+
+void
+foo6 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf32ger (&acc, vec0, vec1);
+  __builtin_mma_xvf32gerpp (&acc, vec0, vec1);
+  __builtin_mma_xvf32gerpn (&acc, vec0, vec1);
+  dst[6] = acc;
+}
+
+void
+foo6b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf32gernp (&acc, vec0, vec1);
+  __builtin_mma_xvf32gernn (&acc, vec0, vec1);
+  dst[6] = acc;
+}
+
+void
+foo7 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi4ger8 (&acc, vec0, vec1, 15, 15, 255);
+  __builtin_mma_pmxvi4ger8pp (&acc, vec0, vec1, 15, 15, 255);
+  dst[7] = acc;
+}
+
+void
+foo8 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi8ger4 (&acc, vec0, vec1, 15, 15, 15);
+  __builtin_mma_pmxvi8ger4pp (&acc, vec0, vec1, 15, 15, 15);
+  __builtin_mma_pmxvi8ger4spp(&acc, vec0, vec1, 15, 15, 15);
+  dst[8] = acc;
+}
+
+void
+foo9 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvi16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  dst[9] = acc;
+}
+
+void
+foo10 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvi16ger2s (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvi16ger2spp (&acc, vec0, vec1, 15, 15, 3);
+  dst[10] = acc;
+}
+
+void
+foo11 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvf16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
+  dst[11] = acc;
+}
+
+void
+foo11b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf16ger2np (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
+  dst[11] = acc;
+}
+
+void
+foo12 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvbf16ger2 (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2pp (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2pn (&acc, vec0, vec1, 15, 15, 3);
+  dst[12] = acc;
+}
+
+void
+foo12b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvbf16ger2np (&acc, vec0, vec1, 15, 15, 3);
+  __builtin_mma_pmxvbf16ger2nn (&acc, vec0, vec1, 15, 15, 3);
+  dst[12] = acc;
+}
+
+void
+foo13 (__vector_quad *dst, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_pmxvf32ger (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gerpp (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gerpn (&acc, vec0, vec1, 15, 15);
+  dst[13] = acc;
+}
+
+void
+foo13b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
+{
+  __vector_quad acc;
+  vec_t vec0 = vec[0];
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf32gernp (&acc, vec0, vec1, 15, 15);
+  __builtin_mma_pmxvf32gernn (&acc, vec0, vec1, 15, 15);
+  dst[13] = acc;
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvbf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf32gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2s\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi16ger2spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi4ger8\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi4ger8pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvi8ger4spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvbf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2nn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2np\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2pn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf32gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2s\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi16ger2spp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi4ger8\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi4ger8pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4pp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvi8ger4spp\M} 1 } } */
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo0 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  __builtin_mma_xvf64ger (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gerpp (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gerpn (&acc, vecp0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo1 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_xvf64gernp (&acc, vecp0, vec1);
+  __builtin_mma_xvf64gernn (&acc, vecp0, vec1);
+  dst[0] = acc;
+}
+
+void
+foo2 (__vector_quad *dst, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+  __builtin_mma_pmxvf64ger (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gerpp (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gerpn (&acc, vecp0, vec1, 15, 3);
+  dst[1] = acc;
+}
+
+void
+foo3 (__vector_quad *dst, __vector_quad *src, vec_t *vec, __vector_pair *pvecp)
+{
+  __vector_quad acc;
+  __vector_pair vecp0 = *pvecp;
+  vec_t vec1 = vec[1];
+
+  acc = src[0];
+  __builtin_mma_pmxvf64gernp (&acc, vecp0, vec1, 15, 3);
+  __builtin_mma_pmxvf64gernn (&acc, vecp0, vec1, 15, 3);
+  dst[1] = acc;
+}
+
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mxvf64ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvf64gernn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64ger\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gerpp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gerpn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gernp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mpmxvf64gernn\M} 1 } } */
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+void
+foo0 (void)
+{
+  __vector_quad acc;
+  asm ("#..." : "=d" (acc));
+  __builtin_mma_xxmtacc (&acc);
+  __builtin_mma_xxmfacc (&acc);
+  asm ("#..." :: "d" (acc));
+}
+
+typedef unsigned char  vec_t __attribute__((vector_size(16)));
+
+void
+foo1 (vec_t *vec)
+{
+  vec[1] = __builtin_vsx_xvcvspbf16 (vec[0]);
+  vec[3] = __builtin_vsx_xvcvbf16sp (vec[2]);
+}
+
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
+/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
+/* { dg-final { scan-assembler-not {\mstxvp\M} } } */
+/* { dg-final { scan-assembler-times {\mxvcvspbf16\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvcvbf16sp\M} 1 } } */
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char vec_t __attribute__((vector_size(16)));
+
+void
+foo (__vector_pair *dst, vec_t *src)
+{
+  __vector_pair pair;
+  __builtin_mma_assemble_pair (&pair, src[0], src[4]);
+  *dst = pair;
+}
+
+void
+bar (vec_t *dst, __vector_pair *src)
+{
+  vec_t res[2];
+  __builtin_mma_disassemble_pair (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 1 } } */
+
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+typedef unsigned char vec_t __attribute__((vector_size(16)));
+
+void
+foo (__vector_quad *dst, vec_t *src)
+{
+  __vector_quad acc;
+  __builtin_mma_assemble_acc (&acc, src[0], src[4], src[8], src[12]);
+  *dst = acc;
+}
+
+void
+bar (vec_t *dst, __vector_quad *src)
+{
+  vec_t res[4];
+  __builtin_mma_disassemble_acc (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+  dst[8] = res[2];
+  dst[12] = res[3];
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-Wno-psabi -mdejagnu-cpu=future -O2" } */
+
+void
+foo (__vector_quad *dst)
+{
+  __vector_quad acc;
+  __builtin_mma_xxsetaccz (&acc);
+  *dst = acc;
+}
+
+/* { dg-final { scan-assembler-not {\mlxv\M} } } */
+/* { dg-final { scan-assembler-not {\mlxvp\M} } } */
+/* { dg-final { scan-assembler-not {\mxxmtacc\M} } } */
+/* { dg-final { scan-assembler-times {\mxxsetaccz\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */