Commit 888814e9 by Peter Bergner

rs6000: Add base support and types for defining MMA built-ins

Add the new -mmma option as well as the initial MMA support, which includes
the target specific __vector_pair and __vector_quad types, the POImode and
PXImode partial integer modes they are mapped to, and their associated
move patterns.  Support for the restrictions on the registers these modes
can be assigned to as also been added.

2020-06-20  Peter Bergner  <bergner@linux.ibm.com>
	    Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/mma.md: New file.
	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__MMA__ for mma.
	* config/rs6000/rs6000-call.c (rs6000_init_builtins): Add support
	for __vector_pair and __vector_quad types.
	* config/rs6000/rs6000-cpus.def (OTHER_FUTURE_MASKS): Add
	OPTION_MASK_MMA.
	(POWERPC_MASKS): Likewise.
	* config/rs6000/rs6000-modes.def (OI, XI): New integer modes.
	(POI, PXI): New partial integer modes.
	* config/rs6000/rs6000.c (TARGET_INVALID_CONVERSION): Define.
	(rs6000_hard_regno_nregs_internal): Use VECTOR_ALIGNMENT_P.
	(rs6000_hard_regno_mode_ok_uncached): Likewise.
	Add support for POImode being allowed in VSX registers and PXImode
	being allowed in FP registers.
	(rs6000_modes_tieable_p): Adjust comment.
	Add support for POImode and PXImode.
	(rs6000_debug_reg_global) <print_tieable_modes>: Add OImode, POImode
	XImode, PXImode, V2SImode, V2SFmode and CCFPmode..
	(rs6000_setup_reg_addr_masks): Use VECTOR_ALIGNMENT_P.
	Set up appropriate addr_masks for vector pair and vector quad addresses.
	(rs6000_init_hard_regno_mode_ok): Add support for vector pair and
	vector quad registers.  Setup reload handlers for POImode and PXImode.
	(rs6000_builtin_mask_calculate): Add support for RS6000_BTM_MMA.
	(rs6000_option_override_internal): Error if -mmma is specified
	without -mcpu=future.
	(rs6000_slow_unaligned_access): Use VECTOR_ALIGNMENT_P.
	(quad_address_p): Change size test to less than 16 bytes.
	(reg_offset_addressing_ok_p): Add support for ISA 3.1 vector pair
	and vector quad instructions.
	(avoiding_indexed_address_p): Likewise.
	(rs6000_emit_move): Disallow POImode and PXImode moves involving
	constants.
	(rs6000_preferred_reload_class): Prefer VSX registers for POImode
	and FP registers for PXImode.
	(rs6000_split_multireg_move): Support splitting POImode and PXImode
	move instructions.
	(rs6000_mangle_type): Adjust comment.  Add support for mangling
	__vector_pair and __vector_quad types.
	(rs6000_opt_masks): Add entry for mma.
	(rs6000_builtin_mask_names): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
	(rs6000_function_value): Use VECTOR_ALIGNMENT_P.
	(address_to_insn_form): Likewise.
	(reg_to_non_prefixed): Likewise.
	(rs6000_invalid_conversion): New function.
	* config/rs6000/rs6000.h (MASK_MMA): Define.
	(BIGGEST_ALIGNMENT): Set to 512 if MMA support is enabled.
	(VECTOR_ALIGNMENT_P): New helper macro.
	(ALTIVEC_VECTOR_MODE): Use VECTOR_ALIGNMENT_P.
	(RS6000_BTM_MMA): Define.
	(RS6000_BTM_COMMON): Add RS6000_BTM_MMA and RS6000_BTM_FUTURE.
	(rs6000_builtin_type_index): Add RS6000_BTI_vector_pair and
	RS6000_BTI_vector_quad.
	(vector_pair_type_node): New.
	(vector_quad_type_node): New.
	* config/rs6000/rs6000.md: Include mma.md.
	(define_mode_iterator RELOAD): Add POI and PXI.
	* config/rs6000/t-rs6000 (MD_INCLUDES): Add mma.md.
	* config/rs6000/rs6000.opt (-mmma): New.
	* doc/invoke.texi: Document -mmma.

(cherry picked from commit f002c046e37d0027513af5297d9259e1fad29c27)
parent 554eb7d2
;; Matrix-Multiply Assist (MMA) patterns.
;; Copyright (C) 2020 Free Software Foundation, Inc.
;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
;; Michael Meissner <meissner@linux.ibm.com>
;;
;; This file is part of GCC.
;;
;; GCC is free software; you can redistribute it and/or modify it
;; under the terms of the GNU General Public License as published
;; by the Free Software Foundation; either version 3, or (at your
;; option) any later version.
;;
;; GCC is distributed in the hope that it will be useful, but WITHOUT
;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
;; License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GCC; see the file COPYING3. If not see
;; <http://www.gnu.org/licenses/>.
;; The MMA patterns use the multi-register PXImode and POImode partial
;; integer modes to implement the target specific __vector_quad and
;; __vector_pair types that the MMA built-in functions reference.
;; To use these modes, we must define XImode and OImode move patterns
;; so the independent parts of the compiler can use our large partial
;; integer modes. However, if we enable the XImode and OImode move
;; patterns, then the compiler will attempt to use them and this can
;; cause byte swapping issues on litte-endian systems. We don't need
;; the XImode and OImode move patterns for actual code generation,
;; therefore, we define the XImode and OImode move patterns, but we
;; disable their use with a "false" condition flag.
;; Define a disabled OImode move pattern, so we can use POImode.
(define_expand "movoi"
[(set (match_operand:OI 0 "nonimmediate_operand")
(match_operand:OI 1 "input_operand"))]
"0"
{
gcc_unreachable ();
})
;; Vector pair support. POImode can only live in VSRs.
(define_expand "movpoi"
[(set (match_operand:POI 0 "nonimmediate_operand")
(match_operand:POI 1 "input_operand"))]
"TARGET_MMA"
{
rs6000_emit_move (operands[0], operands[1], POImode);
DONE;
})
(define_insn_and_split "*movpoi"
[(set (match_operand:POI 0 "nonimmediate_operand" "=wa,m,wa")
(match_operand:POI 1 "input_operand" "m,wa,wa"))]
"TARGET_MMA
&& (gpc_reg_operand (operands[0], POImode)
|| gpc_reg_operand (operands[1], POImode))"
"@
lxvp%X1 %x0,%1
stxvp%X0 %x1,%0
#"
"&& reload_completed
&& (!MEM_P (operands[0]) && !MEM_P (operands[1]))"
[(const_int 0)]
{
rs6000_split_multireg_move (operands[0], operands[1]);
DONE;
}
[(set_attr "type" "vecload,vecstore,veclogical")
(set_attr "length" "*,*,8")])
;; Define a disabled XImode move pattern, so we can use PXImode.
(define_expand "movxi"
[(set (match_operand:XI 0 "nonimmediate_operand")
(match_operand:XI 1 "input_operand"))]
"0"
{
gcc_unreachable ();
})
;; Vector quad support. PXImode can only live in FPRs.
(define_expand "movpxi"
[(set (match_operand:PXI 0 "nonimmediate_operand")
(match_operand:PXI 1 "input_operand"))]
"TARGET_MMA"
{
rs6000_emit_move (operands[0], operands[1], PXImode);
DONE;
})
(define_insn_and_split "*movpxi"
[(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d")
(match_operand:PXI 1 "input_operand" "m,d,d"))]
"TARGET_MMA
&& (gpc_reg_operand (operands[0], PXImode)
|| gpc_reg_operand (operands[1], PXImode))"
"#"
"&& reload_completed"
[(const_int 0)]
{
rs6000_split_multireg_move (operands[0], operands[1]);
DONE;
}
[(set_attr "type" "vecload,vecstore,veclogical")
(set_attr "length" "8,8,16")
(set_attr "max_prefixed_insns" "2,2,*")])
......@@ -591,6 +591,10 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
PROCESSOR_CELL) (e.g. -mcpu=cell). */
if ((bu_mask & RS6000_BTM_CELL) != 0)
rs6000_define_or_undefine_macro (define_p, "__PPU__");
/* Tell the user if we support the MMA instructions. */
if ((flags & OPTION_MASK_MMA) != 0)
rs6000_define_or_undefine_macro (define_p, "__MMA__");
}
void
......
......@@ -11930,6 +11930,24 @@ rs6000_init_builtins (void)
else
ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
/* Vector paired and vector quad support. */
if (TARGET_MMA)
{
tree oi_uns_type = make_unsigned_type (256);
vector_pair_type_node = build_distinct_type_copy (oi_uns_type);
SET_TYPE_MODE (vector_pair_type_node, POImode);
layout_type (vector_pair_type_node);
lang_hooks.types.register_builtin_type (vector_pair_type_node,
"__vector_pair");
tree xi_uns_type = make_unsigned_type (512);
vector_quad_type_node = build_distinct_type_copy (xi_uns_type);
SET_TYPE_MODE (vector_quad_type_node, PXImode);
layout_type (vector_quad_type_node);
lang_hooks.types.register_builtin_type (vector_quad_type_node,
"__vector_quad");
}
/* Initialize the modes for builtin_function_type, mapping a machine mode to
tree type node. */
builtin_mode_to_type[QImode][0] = integer_type_node;
......@@ -11959,6 +11977,8 @@ rs6000_init_builtins (void)
builtin_mode_to_type[V8HImode][1] = unsigned_V8HI_type_node;
builtin_mode_to_type[V16QImode][0] = V16QI_type_node;
builtin_mode_to_type[V16QImode][1] = unsigned_V16QI_type_node;
builtin_mode_to_type[POImode][1] = vector_pair_type_node;
builtin_mode_to_type[PXImode][1] = vector_quad_type_node;
tdecl = add_builtin_type ("__bool char", bool_char_type_node);
TYPE_NAME (bool_char_type_node) = tdecl;
......
......@@ -76,7 +76,8 @@
| OPTION_MASK_P9_VECTOR)
/* Flags that need to be turned off if -mno-future. */
#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL \
#define OTHER_FUTURE_MASKS (OPTION_MASK_MMA \
| OPTION_MASK_PCREL \
| OPTION_MASK_PREFIXED)
/* Support for a future processor's features. */
......@@ -132,6 +133,7 @@
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
| OPTION_MASK_MMA \
| OPTION_MASK_MODULO \
| OPTION_MASK_MULHW \
| OPTION_MASK_NO_UPDATE \
......
......@@ -82,3 +82,13 @@ VECTOR_MODE (INT, SI, 2); /* V2SI */
for quad memory atomic operations to force getting an even/odd register
combination. */
PARTIAL_INT_MODE (TI, 128, PTI);
/* Define, but don't use the larger integer modes. We need an integer mode
defined that is the same size as the vector pair and vector quad modes. */
INT_MODE (OI, 32);
INT_MODE (XI, 64);
/* Modes used by __vector_pair and __vector_quad. */
PARTIAL_INT_MODE (OI, 256, POI); /* __vector_pair. */
PARTIAL_INT_MODE (XI, 512, PXI); /* __vector_quad. */
......@@ -522,6 +522,7 @@ extern int rs6000_vector_align[];
#define MASK_HTM OPTION_MASK_HTM
#define MASK_ISEL OPTION_MASK_ISEL
#define MASK_MFCRF OPTION_MASK_MFCRF
#define MASK_MMA OPTION_MASK_MMA
#define MASK_MULHW OPTION_MASK_MULHW
#define MASK_MULTIPLE OPTION_MASK_MULTIPLE
#define MASK_NO_UPDATE OPTION_MASK_NO_UPDATE
......@@ -776,7 +777,7 @@ extern unsigned rs6000_pointer_size;
#define FUNCTION_BOUNDARY 32
/* No data type wants to be aligned rounder than this. */
#define BIGGEST_ALIGNMENT 128
#define BIGGEST_ALIGNMENT (TARGET_MMA ? 512 : 128)
/* Alignment of field after `int : 0' in a structure. */
#define EMPTY_FIELD_BOUNDARY 32
......@@ -1035,16 +1036,17 @@ enum data_align { align_abi, align_opt, align_both };
((MODE) == V4SFmode \
|| (MODE) == V2DFmode) \
/* Note KFmode and possibly TFmode (i.e. IEEE 128-bit floating point) are not
really a vector, but we want to treat it as a vector for moves, and
such. */
/* Modes that are not vectors, but require vector alignment. Treat these like
vectors in terms of loads and stores. */
#define VECTOR_ALIGNMENT_P(MODE) \
(FLOAT128_VECTOR_P (MODE) || (MODE) == POImode || (MODE) == PXImode)
#define ALTIVEC_VECTOR_MODE(MODE) \
((MODE) == V16QImode \
|| (MODE) == V8HImode \
|| (MODE) == V4SFmode \
|| (MODE) == V4SImode \
|| FLOAT128_VECTOR_P (MODE))
|| VECTOR_ALIGNMENT_P (MODE))
#define ALTIVEC_OR_VSX_VECTOR_MODE(MODE) \
(ALTIVEC_VECTOR_MODE (MODE) || VSX_VECTOR_MODE (MODE) \
......@@ -2304,6 +2306,8 @@ extern int frame_pointer_needed;
#define RS6000_BTM_POWERPC64 MASK_POWERPC64 /* 64-bit registers. */
#define RS6000_BTM_FLOAT128 MASK_FLOAT128_KEYWORD /* IEEE 128-bit float. */
#define RS6000_BTM_FLOAT128_HW MASK_FLOAT128_HW /* IEEE 128-bit float h/w. */
#define RS6000_BTM_MMA MASK_MMA /* ISA 3.1 MMA. */
#define RS6000_BTM_FUTURE MASK_FUTURE
#define RS6000_BTM_COMMON (RS6000_BTM_ALTIVEC \
| RS6000_BTM_VSX \
......@@ -2324,7 +2328,9 @@ extern int frame_pointer_needed;
| RS6000_BTM_LDBL128 \
| RS6000_BTM_POWERPC64 \
| RS6000_BTM_FLOAT128 \
| RS6000_BTM_FLOAT128_HW)
| RS6000_BTM_FLOAT128_HW \
| RS6000_BTM_MMA \
| RS6000_BTM_FUTURE)
/* Define builtin enum index. */
......@@ -2433,6 +2439,8 @@ enum rs6000_builtin_type_index
RS6000_BTI_ieee128_float, /* ieee 128-bit floating point */
RS6000_BTI_ibm128_float, /* IBM 128-bit floating point */
RS6000_BTI_const_str, /* pointer to const char * */
RS6000_BTI_vector_pair, /* unsigned 256-bit types (vector pair). */
RS6000_BTI_vector_quad, /* unsigned 512-bit types (vector quad). */
RS6000_BTI_MAX
};
......@@ -2485,6 +2493,8 @@ enum rs6000_builtin_type_index
#define ieee128_float_type_node (rs6000_builtin_types[RS6000_BTI_ieee128_float])
#define ibm128_float_type_node (rs6000_builtin_types[RS6000_BTI_ibm128_float])
#define const_str_type_node (rs6000_builtin_types[RS6000_BTI_const_str])
#define vector_pair_type_node (rs6000_builtin_types[RS6000_BTI_vector_pair])
#define vector_quad_type_node (rs6000_builtin_types[RS6000_BTI_vector_quad])
extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
......
......@@ -757,7 +757,8 @@
;; Reload iterator for creating the function to allocate a base register to
;; supplement addressing modes.
(define_mode_iterator RELOAD [V16QI V8HI V4SI V2DI V4SF V2DF V1TI
SF SD SI DF DD DI TI PTI KF IF TF])
SF SD SI DF DD DI TI PTI KF IF TF
POI PXI])
;; Iterate over smin, smax
(define_code_iterator fp_minmax [smin smax])
......@@ -14736,6 +14737,7 @@
(include "vector.md")
(include "vsx.md")
(include "altivec.md")
(include "mma.md")
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
......@@ -578,3 +578,7 @@ Generate (do not generate) prefixed memory instructions.
mpcrel
Target Report Mask(PCREL) Var(rs6000_isa_flags)
Generate (do not generate) pc-relative memory addressing.
mmma
Target Report Mask(MMA) Var(rs6000_isa_flags)
Generate (do not generate) MMA instructions.
......@@ -83,6 +83,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/vector.md \
$(srcdir)/config/rs6000/vsx.md \
$(srcdir)/config/rs6000/altivec.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
$(srcdir)/config/rs6000/dfp.md
......@@ -1198,7 +1198,7 @@ See RS/6000 and PowerPC Options.
-mgnu-attribute -mno-gnu-attribute @gol
-mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
-mstack-protector-guard-offset=@var{offset} -mprefixed -mno-prefixed @gol
-mpcrel -mno-pcrel}
-mpcrel -mno-pcrel -mmma -mno-mmma}
@emph{RX Options}
@gccoptlist{-m64bit-doubles -m32bit-doubles -fpu -nofpu@gol
......@@ -25622,7 +25622,8 @@ following options:
-mpowerpc-gpopt -mpowerpc-gfxopt @gol
-mmulhw -mdlmzb -mmfpgpr -mvsx @gol
-mcrypto -mhtm -mpower8-fusion -mpower8-vector @gol
-mquad-memory -mquad-memory-atomic -mfloat128 -mfloat128-hardware}
-mquad-memory -mquad-memory-atomic -mfloat128 @gol
-mfloat128-hardware -mprefixed -mpcrel -mmma}
The particular options set for any particular CPU varies between
compiler versions, depending on what setting seems to produce optimal
......@@ -26618,6 +26619,13 @@ addressing (@option{-mprefixed}) options are enabled.
@opindex mno-prefixed
Generate (do not generate) addressing modes using prefixed load and
store instructions when the option @option{-mcpu=future} is used.
@item -mmma
@itemx -mno-mma
@opindex mmma
@opindex mno-mma
Generate (do not generate) the MMA instructions when the option
@option{-mcpu=future} is used.
@end table
@node RX Options
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment