Commits · e464fc903506b75bef90374ab520b52df317a00e · lvzhengyang / riscv-gcc-1

03 Feb, 2020 6 commits

[OpenACC] bump version for 2.6 plus libgomp.texi update · e464fc90

2020-02-03  Julian Brown  <julian@codesourcery.com>
            Tobias Burnus  <tobias@codesourcery.com>

	gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins): Update _OPENACC define to 201711.

	gcc/
	* doc/invoke.texi: Update mention of OpenACC version to 2.6.

	gcc/fortran/
	* cpp.c (cpp_define_builtins): Update _OPENACC define to 201711.
	* intrinsic.texi: Update mentions of OpenACC version to 2.6.
	* gfortran.texi: Likewise. Remove experimental disclamer for OpenACC.
	* invoke.texi: Remove experimental disclamer for OpenACC.

	gcc/testsuite/
	* c-c++-common/cpp/openacc-define-3.c: Update expected value for
	_OPENACC define.
	* gfortran.dg/openacc-define-3.f90: Likewise.

	libgomp/
	* libgomp.texi (OpenACC Runtime Library Routines): Document *_async
	and *_finalize variants; document acc_attach and acc_detach; update
	references from OpenACC 2.0 to 2.6.
	* openacc.f90 (openacc_version): Update to 201711.
	* openacc_lib.h (openacc_version): Update to 201711.
	* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Update expected
	openacc_version to 201711.
	* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise.

committed Feb 03, 2020

e464fc90 Browse Files

[OpenMP] Add missing parameters to omp_lib documentation (PR fortran/93541) · 7c8e1f92
```
	PR fortran/93541
	* intrinisic.texi (OpenMP Modules OMP_LIB and OMP_LIB_KINDS):
	Add undocumented parameters from omp_lib.f90.in.
```
Tobias Burnus committed Feb 03, 2020
7c8e1f92 Browse Files

[Fortran] Fix to strict associate check (PR93427) · ae86ede8

        PR fortran/93427
        * resolve.c (resolve_assoc_var): Remove too strict check.
        * gfortran.dg/associate_51.f90: Update test case.

        PR fortran/93427
        * gfortran.dg/associate_52.f90: New.

committed Feb 03, 2020

ae86ede8 Browse Files

s390x: Fix popcounthi2_z196 expander [PR93533] · f626ae54

The following testcase started to ICE when .POPCOUNT matching has been added
to match.pd; we had __builtin_popcount*, but nothing would use the
popcounthi2 expander before.

The problem is that the popcounthi2_z196 expander doesn't emit valid RTL:
error: unrecognizable insn:
(insn 138 137 139 27 (set (reg:SI 190)
        (ashift:SI (reg:HI 95 [ _105 ])
            (const_int 8 [0x8]))) -1
     (nil))
during RTL pass: vregs
The following patch is an attempt to fix that, furthermore I've tried to
slightly simplify it as well, it makes no sense to me to perform
(x + (x << 8)) >> 8 when we need to either zero extend or mask the result
at the end in order to avoid bits from above HImode to affect it, when we
can do
(x + (x >> 8)) & 0xff (or zero extension).

2020-02-03  Jakub Jelinek  <jakub@redhat.com>

	PR target/93533
	* config/s390/s390.md (popcounthi2_z196): Fix up expander to emit
	valid RTL to sum up the lowest and second lowest bytes of the popcnt
	result.

	* gcc.c-torture/compile/pr93533.c: New test.
	* gcc.target/s390/pr93533.c: New test.

committed Feb 03, 2020

f626ae54 Browse Files

coroutines: Bind label_decl of original function to actor function · c3ccce5b

gcc/cp
    * coroutines.cc (transform_await_wrapper): Set actor funcion as
    new context of label_decl.
    (build_actor_fn): Fill new field of await_xform_data.

gcc/testsuite
    * g++.dg/coroutines/co-await-04-control-flow.C: Add label.

committed Feb 03, 2020

c3ccce5b Browse Files

Daily bump. · 75201e82
GCC Administrator committed Feb 03, 2020

75201e82 Browse Files

02 Feb, 2020 4 commits

c++: Fix ICE on invalid alignas in a template [PR93530] · b817be03

This fixes an ICE taking place in cp_default_conversion because we got
a SCOPE_REF that doesn't have a type and so checking
INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P (TREE_TYPE (exp)) will crash.
This happens since the recent Joseph's change in decl_attributes whereby
we don't skip C++11 attributes on types.

[dcl.align] is clear that alignas applied to a function is ill-formed.
That should be fixed, and we have PR90847 for that.  But I think a more
appropriate fix at this stage would be the following: in a template we
want to splice dependent attributes and save them for later, and by
doing so avoid this crash.

	PR c++/93530 - ICE on invalid alignas in a template.
	* decl.c (grokdeclarator): Call cplus_decl_attributes instead of
	decl_attributes.

	* g++.dg/cpp0x/alignas18.C: New test.

committed Feb 02, 2020

b817be03 Browse Files

testsuite,Darwin,PPC: Adjust darwin-abi-12.c for common section use. · 26a591f2

This test explicitly tests for code generation that expects a
common section.

gcc/testsuite/ChangeLog:

2020-02-02  Iain Sandoe  <iain@sandoe.co.uk>

* gcc.target/powerpc/darwin-abi-12.c: Add '-fcommon' to the
options.

committed Feb 02, 2020

26a591f2 Browse Files

One more fix for PR 91333 - suboptimal register allocation for inline asm · 897a7308

2020-02-02  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/91333
	* ira-color.c (struct allocno_color_data): Add member
	hard_reg_prefs.
	(init_allocno_threads): Set the member up.
	(bucket_allocno_compare_func): Add compare hard reg
	prefs.

2020-02-02  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/91333
	* gcc.target/i386/pr91333.c: Add vmovsd to regexp.  Set up count
	to 3.

committed Feb 02, 2020

897a7308 Browse Files

Daily bump. · 0303907e
GCC Administrator committed Feb 02, 2020

0303907e Browse Files

01 Feb, 2020 5 commits

fortran: Fix up TYPE_ARG_TYPES of procs with scalar VALUE optional args [PR92305] · add31061

The following patch fixes
-FAIL: libgomp.fortran/use_device_addr-1.f90   -O0  execution test
-FAIL: libgomp.fortran/use_device_addr-2.f90   -O0  execution test
that has been FAILing for several months on powerpc64le-linux.
The problem is in the Fortran FE, which adds the artificial arguments
for scalar VALUE OPTIONAL dummy args only to DECL_ARGUMENTS where the
current function can see them, but not to TYPE_ARG_TYPES; if those functions
aren't varargs, this confuses calls.c to pass the remaining arguments
(which aren't named (== not covered by TYPE_ARG_TYPES) and aren't varargs
either) in a different spot from what the callee (which has proper
DECL_ARGUMENTS for all args) expects.  For the artificial length arguments
for character dummy args we already put them in both DECL_ARGUMENTS and
TYPE_ARG_TYPES.

2020-02-01  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/92305
	* trans-types.c (gfc_get_function_type): Also push boolean_type_node
	types for non-character scalar VALUE optional dummy arguments.
	* trans-decl.c (create_function_arglist): Skip those in
	hidden_typelist.  Formatting fix.

committed Feb 01, 2020

add31061 Browse Files

nios2: Support for GOT-relative DW_EH_PE_datarel encoding. · 2d33dcfe

On nios2-linux-gnu, there has been a long-standing bug in C++ exception
handling that sometimes resulted in link errors like

../nios2-linux-gnu/bin/ld: FDE encoding in /tmp/cccfpQ2l.o(.eh_frame) prevents .eh_frame_hdr table being created

when building some shared libraries or PIE executables. The root of
the problem is that GCC was incorrectly emitting an absolute encoding
in EH tables for PIC. This patch changes it to use either
DW_EH_PE_indirect (for global) or DW_EH_PE_datarel (for local), and
fixes libgcc so it can find the address of the GOT as the base address
for DW_EH_PE_datarel.

Complicating matters somewhat, GAS was missing support for
%gotoff(symbol) relocation syntax. I have just pushed a fix for that,
but I've added a configure check to test for presence of the binutils
support and fall back to the current absolute encoding (which works
most of the time) if it is not available. Once the fix makes it into
an official binutils release it might be appropriate to make this
error out instead.

Since this is a wrong-code bug and affects only nios2 target, I think
this is appropriate for Stage 4. I regression-tested on both
nios2-linux-gnu and nios2-elf, with and without the binutils support
present, before committing this.

2020-01-31 Sandra Loosemore <sandra@codesourcery.com>

gcc/
* configure.ac [nios2-*-*]: Check HAVE_AS_NIOS2_GOTOFF_RELOCATION.
* config.in: Regenerated.
* configure: Regenerated.
* config/nios2/nios2.h (ASM_PREFERRED_EH_DATA_FORMAT): Fix handling
for PIC when HAVE_AS_NIOS2_GOTOFF_RELOCATION.
(ASM_MAYBE_OUTPUT_ENCODED_ADDR_RTX): New.

gcc/testsuite/
* g++.target/nios2/hello-pie.C: New.
* g++.target/nios2/nios2.exp: New.

libgcc/
* config.host [nios2-*-linux*] (tmake_file, tm_file): Adjust.
* config/nios2-elf-lib.h: New.
* unwind-dw2-fde-dip.c (_Unwind_IteratePhdrCallback): Use existing
code for finding GOT base for nios2.

committed Jan 31, 2020

2d33dcfe Browse Files

Fixes after recent configure changes relating to static libraries · 20fa702b

This commit:

  commit e7c26e04 (tjteru/master)
  Date:   Wed Jan 22 14:54:26 2020 +0000

      gcc: Add new configure options to allow static libraries to be selected

contains a couple of issues.  First I failed to correctly regenerate
all of the configure files it should have done.  Second, there was a
mistake in lib-link.m4, one of the conditions didn't use pure sh
syntax, I wrote this:

  if x$lib_type = xauto || x$lib_type = xshared; then

When I should have written this:

  if test "x$lib_type" = "xauto" || test "x$lib_type" = "xshared"; then

These issues were raised on the mailing list in these messages:

  https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01827.html
  https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01921.html

config/ChangeLog:

	* lib-link.m4 (AC_LIB_LINKFLAGS_BODY): Update shell syntax.

gcc/ChangeLog:

	* configure: Regenerate.

intl/ChangeLog:

	* configure: Regenerate.

libcpp/ChangeLog:

	* configure: Regenerate.

libstdc++-v3/ChangeLog:

	* configure: Regenerate.

committed Feb 01, 2020

20fa702b Browse Files

Daily bump. · d1a80303
GCC Administrator committed Feb 01, 2020

d1a80303 Browse Files

c++: Fix sizeof VLA lambda capture. · 00a49cd8

sizeof a VLA type is not a constant in C or the GNU C++ extension, so we
need to capture the VLA even in unevaluated context.  For PR60855 we stopped
looking through a previous capture, but we also need to capture the first
time the variable is mentioned.

	PR c++/86216
	* semantics.c (process_outer_var_ref): Capture VLAs even in
	unevaluated context.

committed Jan 31, 2020

00a49cd8 Browse Files

31 Jan, 2020 25 commits

c++: Reduce memory consumption for arrays of non-aggregate type. · e98ebda0

The remaining low-hanging fruit for improvement on memory consumption in the
14179 testcase was the duplication of the CONSTRUCTOR for the array by
reshape_init.  This patch changes reshape_init to reuse a single constructor
for an array of non-aggregate type such as the one in the testcase.

	PR c++/14179
	* decl.c (reshape_init_array_1): Reuse a single CONSTRUCTOR with
	non-aggregate elements.
	(reshape_init_array): Add first_initializer_p parm.
	(reshape_init_r): Change first_initializer_p from bool to tree.
	(reshape_init): Pass init to it.

committed Jan 31, 2020

e98ebda0 Browse Files

c++: Reduce memory consumption for large static arrays. · d2b9548f

PR14179 and the C counterpart PR12245 are about memory consumption of very
large file-scope arrays.  Recently, location wrappers increased memory
consumption significantly: in an array of integer constants, each one will
have a location wrapper, which added up to over 500MB in the 14179
testcase.  For this kind of testcase tracking these locations isn't worth
the cost, so this patch turns the wrappers off after 256 elements; any array
that size or larger isn't likely to be interested in the location of
individual integer constants.

	PR c++/14179
	* parser.c (cp_parser_initializer_list): Suppress location wrappers
	after 256 elements.

committed Jan 31, 2020

d2b9548f Browse Files

analyzer: fix ICE with 'const void *' (PR 93457) · 67751724

gcc/analyzer/ChangeLog:
	PR analyzer/93457
	* region-model.cc (make_region_for_type): Use VOID_TYPE_P rather
	than checking against void_type_node.

gcc/testsuite/ChangeLog:
	PR analyzer/93457
	* gcc.dg/analyzer/pr93457.c: New test.

committed Jan 31, 2020

67751724 Browse Files

analyzer: fix ICE handling void-type (PR 93373) · 09bea584

gcc/analyzer/ChangeLog:
	PR analyzer/93373
	* region-model.cc (ASSERT_COMPAT_TYPES): Convert to...
	(assert_compat_types): ...this, and bail when either type is NULL,
	or when VOID_TYPE_P (dst_type).
	(region_model::get_lvalue): Update for above conversion.
	(region_model::get_rvalue): Likewise.

gcc/testsuite/ChangeLog:
	PR analyzer/93373
	* gcc.dg/analyzer/torture/pr93373.c: New test.

committed Jan 31, 2020

09bea584 Browse Files

Fix for PR 91333 - suboptimal register allocation for inline asm · 2a07345c

    2020-01-31  Vladimir Makarov  <vmakarov@redhat.com>

            PR rtl-optimization/91333
            * ira-color.c (bucket_allocno_compare_func): Move conflict hard
            reg preferences comparison up.

    2020-01-31  Vladimir Makarov  <vmakarov@redhat.com>

            PR rtl-optimization/91333
            * gcc.target/i386/pr91333.c: New.

committed Jan 31, 2020

2a07345c Browse Files

analyzer: fix ICE getting void return value (PR 93379) · f1c807e8

PR analyzer/93379 reports an ICE within
region_model::update_for_return_superedge when writing the
returned svalue_id to the lhs of the call_stmt

The root cause is that this analyzer code assumed that for any call
with a non-NULL gimple_call_lhs, the called fndecl would have non-void
return type, and thus that a non-null svalue_id would be returned from
region_model::pop_frame.  This isn't the case e.g. for a call with
conflicting types where the callee returns void but the caller assumes
int.

This patch fixes the ICE by moving the check for null result so that
it also guards setting the lhs.

gcc/analyzer/ChangeLog:
	PR analyzer/93379
	* region-model.cc (region_model::update_for_return_superedge):
	Move check for null result so that it also guards setting the
	lhs.

gcc/testsuite/ChangeLog:
	PR analyzer/93379
	* gcc.dg/analyzer/torture/pr93379-2.c: New test.
	* gcc.dg/analyzer/torture/pr93379.c: New test.

committed Jan 31, 2020

f1c807e8 Browse Files

analyzer: fix ICE with pointers between stack frames (PR 93438) · 455f58ec

PR analyzer/93438 reports an ICE when merging two region_models
in which an older stack frame has a local pointing to a local in
a more recent stack frame.

  stack
    older frame
      int *: "ow" --+
                    |
    newer frame     |
      int: "pk" <---+

The root cause is that the state-merging code assumes that all frame
regions in the merged model have already been created.
stack_region::can_merge_p iterates through the frames, creating
and populating each merged frame in turn, so when it attempts to
populate the older frame, it attempts to reference the newer frame in
the merged model, which doesn't exist yet.

This patch reworks stack_region::can_merge_p to use a two-pass approach
in which all frames in the merged model are created first, and then
are all populated, fixing the bug.

gcc/analyzer/ChangeLog:
	PR analyzer/93438
	* region-model.cc (stack_region::can_merge_p): Split into a two
	pass approach, creating all stack regions first, then populating
	them.
	(selftest::test_state_merging): Add test coverage for (a) the case
	of self-merging a model in which a local in an older stack frame
	points to a local in a more recent stack frame (which previously
	would ICE), and (b) the case of self-merging a model in which a
	local points to a global (which previously worked OK).

gcc/testsuite/ChangeLog:
	PR analyzer/93438
	* gcc.dg/analyzer/torture/pr93438.c: New test.
	* gcc.dg/analyzer/torture/pr93438-2.c: New test.

committed Jan 31, 2020

455f58ec Browse Files

testsuite: Fix up pr91838.C test [PR91838] · 5910b145

The test FAILs on i686-linux with:
FAIL: g++.dg/pr91838.C   (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:8: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi]
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:3: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi]
and on x86_64-linux with -m32 testing with failure to match the
expected pattern in there (or both with e.g. -m32/-mno-mmx/-mno-sse testing).
The test is also in a wrong directory, has non-standard specification that
it requires c++11 or later.

2020-01-31  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/91838
	* g++.dg/pr91838.C: Moved to ...
	* g++.dg/opt/pr91838.C: ... here.  Require c++11 target instead of
	dg-skip-if for c++98.  Pass -Wno-psabi -w to avoid psabi style
	warnings on vector arg passing or return.  Add -masm=att on i?86/x86_64.
	Only check for pxor %xmm0, %xmm0 on lp64 i?86/x86_64.

committed Jan 31, 2020

5910b145 Browse Files

aarch64: Add Armv8.6 SVE bfloat16 support · 896dff99

This patch adds support for the SVE intrinsics that map to Armv8.6
bfloat16 instructions.  This means that svcvtnt is now a base SVE
function for one type suffix combination; the others are still
SVE2-specific.

This relies on a binutils fix:

    https://sourceware.org/ml/binutils/2020-01/msg00450.html

so anyone testing older binutils 2.34 or binutils master sources will
need to upgrade to get clean test results.  (At the time of writing,
no released version of binutils has this bug.)

2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/aarch64.h (TARGET_SVE_BF16): New macro.
	* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtnt): Move to
	aarch64-sve-builtins-base.h.
	* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtnt): Move to
	aarch64-sve-builtins-base.cc.
	* config/aarch64/aarch64-sve-builtins-base.h (svbfdot, svbfdot_lane)
	(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
	(svcvtnt): Declare.
	* config/aarch64/aarch64-sve-builtins-base.cc (svbfdot, svbfdot_lane)
	(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
	(svcvtnt): New functions.
	* config/aarch64/aarch64-sve-builtins-base.def (svbfdot, svbfdot_lane)
	(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
	(svcvtnt): New functions.
	(svcvt): Add a form that converts f32 to bf16.
	* config/aarch64/aarch64-sve-builtins-shapes.h (ternary_bfloat)
	(ternary_bfloat_lane, ternary_bfloat_lanex2, ternary_bfloat_opt_n):
	Declare.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type):
	Treat B as bfloat16_t.
	(ternary_bfloat_lane_base): New class.
	(ternary_bfloat_def): Likewise.
	(ternary_bfloat): New shape.
	(ternary_bfloat_lane_def): New class.
	(ternary_bfloat_lane): New shape.
	(ternary_bfloat_lanex2_def): New class.
	(ternary_bfloat_lanex2): New shape.
	(ternary_bfloat_opt_n_def): New class.
	(ternary_bfloat_opt_n): New shape.
	* config/aarch64/aarch64-sve-builtins.cc (TYPES_cvt_bfloat): New macro.
	* config/aarch64/aarch64-sve.md (@aarch64_sve_<sve_fp_op>vnx4sf)
	(@aarch64_sve_<sve_fp_op>_lanevnx4sf): New patterns.
	(@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>)
	(@cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
	(*cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
	(@aarch64_sve_cvtnt<VNx8BF_ONLY:mode>): Likewise.
	* config/aarch64/aarch64-sve2.md (@aarch64_sve2_cvtnt<mode>): Key
	the pattern off the narrow mode instead of the wider one.
	* config/aarch64/iterators.md (VNx8BF_ONLY): New mode iterator.
	(UNSPEC_BFMLALB, UNSPEC_BFMLALT, UNSPEC_BFMMLA): New unspecs.
	(sve_fp_op): Handle them.
	(SVE_BFLOAT_TERNARY_LONG): New int itertor.
	(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_aarch64_asm_bf16_ok):
	New proc.
	* gcc.target/aarch64/sve/acle/asm/bfdot_f32.c: New test.
	* gcc.target/aarch64/sve/acle/asm/bfdot_lane_f32.c: Likweise.
	* gcc.target/aarch64/sve/acle/asm/bfmlalb_f32.c: Likweise.
	* gcc.target/aarch64/sve/acle/asm/bfmlalb_lane_f32.c: Likweise.
	* gcc.target/aarch64/sve/acle/asm/bfmlalt_f32.c: Likweise.
	* gcc.target/aarch64/sve/acle/asm/bfmlalt_lane_f32.c: Likweise.
	* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likweise.
	* gcc.target/aarch64/sve/acle/asm/cvt_bf16.c: Likweise.
	* gcc.target/aarch64/sve/acle/asm/cvtnt_bf16.c: Likweise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_1.c: Likweise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lane_1.c:
	Likweise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lanex2_1.c:
	Likweise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
	Likweise.

committed Jan 31, 2020

896dff99 Browse Files

aarch64: Add svbfloat16_t support to arm_sve.h · 02fcd8ac

This patch adds support for the bfloat16-related vectors to
arm_sve.h.  It also adds support for functions that just treat
bfloat16_t as a bag of 16 bits; these functions are available
for bf16 whenever they're available for other 16-bit types.

Previously "all_data" was used for both data movement and for arithmetic
that happened to be defined for all data types.  Adding bf16 means we
need to distinguish between the two cases.

The patch also reorders the mode definitions in aarch64-modes.def,
which means we no longer need separate VECTOR_MODE entries for BF
vectors.

2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/arm_sve.h: Include arm_bf16.h.
	* config/aarch64/aarch64-modes.def (BF): Move definition before
	VECTOR_MODES.  Remove separate VECTOR_MODES for V4BF and V8BF.
	(SVE_MODES): Handle BF modes.
	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
	BF modes.
	(aarch64_full_sve_mode): Likewise.
	* config/aarch64/iterators.md (SVE_STRUCT): Add VNx16BF, VNx24BF
	and VNx32BF.
	(SVE_FULL, SVE_FULL_HSD, SVE_ALL): Add VNx8BF.
	(Vetype, Vesize, Vctype, VEL, Vel, VEL_INT, V128, v128, vwcore)
	(V_INT_EQUIV, v_int_equiv, V_FP_EQUIV, v_fp_equiv, vector_count)
	(insn_length, VSINGLE, vsingle, VPRED, vpred, VDOUBLE): Handle the
	new SVE BF modes.
	* config/aarch64/aarch64-sve-builtins.h (TYPE_bfloat): New
	type_class_index.
	* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_arith): New macro.
	(TYPES_all_data): Add bf16.
	(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
	(register_tuple_type): Increase buffer size.
	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): New type.
	(bf16): New type suffix.
	* config/aarch64/aarch64-sve-builtins-base.def (svabd, svadd, svaddv)
	(svcmpeq, svcmpge, svcmpgt, svcmple, svcmplt, svcmpne, svmad, svmax)
	(svmaxv, svmin, svminv, svmla, svmls, svmsb, svmul, svsub, svsubr):
	Change type from all_data to all_arith.
	* config/aarch64/aarch64-sve-builtins-sve2.def (svaddp, svmaxp)
	(svminp): Likewise.

gcc/testsuite/
	* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Test mangling
	of svbfloat16_t.
	* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise for
	__SVBfloat16_t.
	* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: New test.
	* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/cnt_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/create2_1.c (create_bf16): Likewise.
	* gcc.target/aarch64/sve/acle/asm/create3_1.c (create_bf16): Likewise.
	* gcc.target/aarch64/sve/acle/asm/create4_1.c (create_bf16): Likewise.
	* gcc.target/aarch64/sve/acle/asm/dup_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dup_lane_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dupq_lane_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ext_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/get2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/get3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/get4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/insr_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lasta_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lastb_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1rq_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnt1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/len_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
	(reinterpret_f16_bf16_tied1, reinterpret_f16_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
	(reinterpret_f32_bf16_tied1, reinterpret_f32_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
	(reinterpret_f64_bf16_tied1, reinterpret_f64_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
	(reinterpret_s16_bf16_tied1, reinterpret_s16_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
	(reinterpret_s32_bf16_tied1, reinterpret_s32_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
	(reinterpret_s64_bf16_tied1, reinterpret_s64_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
	(reinterpret_s8_bf16_tied1, reinterpret_s8_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
	(reinterpret_u16_bf16_tied1, reinterpret_u16_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
	(reinterpret_u32_bf16_tied1, reinterpret_u32_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
	(reinterpret_u64_bf16_tied1, reinterpret_u64_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
	(reinterpret_u8_bf16_tied1, reinterpret_u8_bf16_untied): Likewise.
	* gcc.target/aarch64/sve/acle/asm/rev_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/sel_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/set2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/set3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/set4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/splice_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st3_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st4_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/stnt1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tbl_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef2_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef3_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef4_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/undef_1.c (bfloat16_t): Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_1.c (ret_bf16, ret_bf16x2)
	(ret_bf16x3, ret_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_2.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_3.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_4.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_5.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_6.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/annotate_7.c (fn_bf16, fn_bf16x2)
	(fn_bf16x3, fn_bf16x4): Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_bf16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/gnu_vectors_1.c (bfloat16x16_t): New
	typedef.
	(bfloat16_callee, bfloat16_caller): New tests.
	* gcc.target/aarch64/sve/pcs/gnu_vectors_2.c (bfloat16x16_t): New
	typedef.
	(bfloat16_callee, bfloat16_caller): New tests.
	* gcc.target/aarch64/sve/pcs/return_4.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_128.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_256.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_512.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_1024.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_4_2048.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_128.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_256.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_512.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_1024.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_5_2048.c (CALLER_BF16): New macro.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_128.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_256.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_512.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_1024.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_6_2048.c (bfloat16_t): New typedef.
	(callee_bf16, caller_bf16): New tests.
	* gcc.target/aarch64/sve/pcs/return_7.c (callee_bf16): Likewise
	(caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_8.c (callee_bf16): Likewise
	(caller_bf16): Likewise.
	* gcc.target/aarch64/sve/pcs/return_9.c (callee_bf16): Likewise
	(caller_bf16): Likewise.
	* gcc.target/aarch64/sve2/acle/asm/tbl2_bf16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/tbx_bf16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/whilerw_bf16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/whilewr_bf16.c: Likewise.

committed Jan 31, 2020

02fcd8ac Browse Files

aarch64: Add Armv8.6 SVE matrix multiply support · 36696774

This mostly follows existing practice.  Perhaps the only noteworthy
thing is that svmmla is split across three extensions (i8mm, f32mm
and f64mm), any of which can be enabled independently.  The easiest
way of coping with this seemed to be to add a fourth svmmla entry
for base SVE, but with no type suffixes.  This means that the
overloaded function is always available for C, but never successfully
resolves without the appropriate target feature.

2020-01-31  Dennis Zhang  <dennis.zhang@arm.com>
	    Matthew Malcomson  <matthew.malcomson@arm.com>
	    Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* doc/invoke.texi (f32mm): Document new AArch64 -march= extension.
	* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
	__ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and
	__ARM_FEATURE_SVE_MATMUL_FP64 as appropriate.  Don't define
	__ARM_FEATURE_MATMUL_FP64.
	* config/aarch64/aarch64-option-extensions.def (fp, simd, fp16)
	(sve): Add AARCH64_FL_F32MM to the list of extensions that should
	be disabled at the same time.
	(f32mm): New extension.
	* config/aarch64/aarch64.h (AARCH64_FL_F32MM): New macro.
	(AARCH64_FL_F64MM): Bump to the next bit up.
	(AARCH64_ISA_F32MM, TARGET_SVE_I8MM, TARGET_F32MM, TARGET_SVE_F32MM)
	(TARGET_SVE_F64MM): New macros.
	* config/aarch64/iterators.md (SVE_MATMULF): New mode iterator.
	(UNSPEC_FMMLA, UNSPEC_SMATMUL, UNSPEC_UMATMUL, UNSPEC_USMATMUL)
	(UNSPEC_TRN1Q, UNSPEC_TRN2Q, UNSPEC_UZP1Q, UNSPEC_UZP2Q, UNSPEC_ZIP1Q)
	(UNSPEC_ZIP2Q): New unspeccs.
	(DOTPROD_US_ONLY, PERMUTEQ, MATMUL, FMMLA): New int iterators.
	(optab, sur, perm_insn): Handle the new unspecs.
	(sve_fp_op): Handle UNSPEC_FMMLA.  Resort.
	* config/aarch64/aarch64-sve.md (@aarch64_sve_ld1ro<mode>): Use
	TARGET_SVE_F64MM instead of separate tests.
	(@aarch64_<DOTPROD_US_ONLY:sur>dot_prod<vsi2qi>): New pattern.
	(@aarch64_<DOTPROD_US_ONLY:sur>dot_prod_lane<vsi2qi>): Likewise.
	(@aarch64_sve_add_<MATMUL:optab><vsi2qi>): Likewise.
	(@aarch64_sve_<FMMLA:sve_fp_op><mode>): Likewise.
	(@aarch64_sve_<PERMUTEQ:optab><mode>): Likewise.
	* config/aarch64/aarch64-sve-builtins.cc (TYPES_s_float): New macro.
	(TYPES_s_float_hsd_integer, TYPES_s_float_sd_integer): Use it.
	(TYPES_s_signed): New macro.
	(TYPES_s_integer): Use it.
	(TYPES_d_float): New macro.
	(TYPES_d_data): Use it.
	* config/aarch64/aarch64-sve-builtins-shapes.h (mmla): Declare.
	(ternary_intq_uintq_lane, ternary_intq_uintq_opt_n, ternary_uintq_intq)
	(ternary_uintq_intq_lane, ternary_uintq_intq_opt_n): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): New class.
	(svmmla): New shape.
	(ternary_resize2_opt_n_base): Add TYPE_CLASS2 and TYPE_CLASS3
	template parameters.
	(ternary_resize2_lane_base): Likewise.
	(ternary_resize2_base): New class.
	(ternary_qq_lane_base): Likewise.
	(ternary_intq_uintq_lane_def): Likewise.
	(ternary_intq_uintq_lane): New shape.
	(ternary_intq_uintq_opt_n_def): New class
	(ternary_intq_uintq_opt_n): New shape.
	(ternary_qq_lane_def): Inherit from ternary_qq_lane_base.
	(ternary_uintq_intq_def): New class.
	(ternary_uintq_intq): New shape.
	(ternary_uintq_intq_lane_def): New class.
	(ternary_uintq_intq_lane): New shape.
	(ternary_uintq_intq_opt_n_def): New class.
	(ternary_uintq_intq_opt_n): New shape.
	* config/aarch64/aarch64-sve-builtins-base.h (svmmla, svsudot)
	(svsudot_lane, svtrn1q, svtrn2q, svusdot, svusdot_lane, svusmmla)
	(svuzp1q, svuzp2q, svzip1q, svzip2q): Declare.
	* config/aarch64/aarch64-sve-builtins-base.cc (svdot_lane_impl):
	Generalize to...
	(svdotprod_lane_impl): ...this new class.
	(svmmla_impl, svusdot_impl): New classes.
	(svdot_lane): Update to use svdotprod_lane_impl.
	(svmmla, svsudot, svsudot_lane, svtrn1q, svtrn2q, svusdot)
	(svusdot_lane, svusmmla, svuzp1q, svuzp2q, svzip1q, svzip2q): New
	functions.
	* config/aarch64/aarch64-sve-builtins-base.def (svmmla): New base
	function, with no types defined.
	(svmmla, svusmmla, svsudot, svsudot_lane, svusdot, svusdot_lane): New
	AARCH64_FL_I8MM functions.
	(svmmla): New AARCH64_FL_F32MM function.
	(svld1ro): Depend only on AARCH64_FL_F64MM, not on AARCH64_FL_V8_6.
	(svmmla, svtrn1q, svtrn2q, svuz1q, svuz2q, svzip1q, svzip2q): New
	AARCH64_FL_F64MM function.
	(REQUIRED_EXTENSIONS):

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_aarch64_asm_i8mm_ok)
	(check_effective_target_aarch64_asm_f32mm_ok): New target selectors.
	* gcc.target/aarch64/pragma_cpp_predefs_2.c: Test handling of
	__ARM_FEATURE_SVE_MATMUL_INT8, __ARM_FEATURE_SVE_MATMUL_FP32 and
	__ARM_FEATURE_SVE_MATMUL_FP64.
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_TRIPLE_Z):
	(TEST_TRIPLE_Z_REV2, TEST_TRIPLE_Z_REV, TEST_TRIPLE_LANE_REG)
	(TEST_TRIPLE_ZX): New macros.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Remove +sve and
	rely on +f64mm to enable it.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_f32.c: New test.
	* gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise,
	* gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise,
	* gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise,
	* gcc.target/aarch64/sve/acle/asm/sudot_lane_s32.c: Likewise,
	* gcc.target/aarch64/sve/acle/asm/sudot_s32.c: Likewise,
	* gcc.target/aarch64/sve/acle/asm/trn1q_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn1q_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/trn2q_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/usdot_lane_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/usdot_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp1q_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/uzp2q_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip1q_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/zip2q_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/mmla_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/mmla_2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/mmla_3.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/mmla_4.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/mmla_5.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/mmla_6.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/mmla_7.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_lane_1.c:
	Likewise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_intq_uintq_opt_n_1.c:
	Likewise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_1.c:
	Likewise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_lane_1.c:
	Likewise.
	* gcc.target/aarch64/sve/acle/general-c/ternary_uintq_intq_opt_n_1.c:
	Likewise.

committed Jan 31, 2020

36696774 Browse Files

aarch64: Fix SVE PCS failures for BE & ILP32 · 2171a920

This patch should (finally!) give clean test results for
aarch64-sve-pcs.exp for all {be,le}{lp64,ilp32} combinations.

The *_128.c tests require aarch64_little_endian because they test for
fixed-length 128-bit code, whereas -msve-vector-bits=128 still generates
VLA code for big-endian.

Some tests require lp64 because they match (64-bit) pointer loads and
stores.  Others require it because ilp32 adds extra zero extensions.

We still have a non-trivial amount of coverage for -mbig-endian -mabi=ilp32:

 # of expected passes            663
 # of unsupported tests          59

2020-01-31  Richard Sandiford  <richard.sandiford@arm.com>

gcc/testsuite/
	* gcc.target/aarch64/sve/pcs/args_1.c: Require lp64 for
	check-function-bodies tests.
	* gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
	* gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Require lp64.
	* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_7.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
	* gcc.target/aarch64/sve/pcs/args_9.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_4_128.c: Require lp64 and
	aarch64_little_endian for check-function-bodies tests.
	* gcc.target/aarch64/sve/pcs/return_5_128.c: Likewise.
	* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
	* gcc.target/aarch64/sve/pcs/return_1_128.c: Likewise.  Remove
	target selector from dg-compile.
	* gcc.target/aarch64/sve/pcs/return_6_128.c: Likewise.

committed Jan 31, 2020

2171a920 Browse Files

libstdc++: Always return a sentinel<I> from __gnu_test::test_range::end() · 6e5a1963

It seems that in practice std::sentinel_for<I, I> is always true, and so the
test_range container doesn't help us detect bugs in ranges code in which we
wrongly assume that a sentinel can be manipulated like an iterator.  Make the
test_range range more strict by having end() unconditionally return a
sentinel<I>, and adjust some tests accordingly.

libstdc++-v3/ChangeLog:

	* testsuite/24_iterators/range_operations/distance.cc: Do not assume
	test_range::end() returns the same type as test_range::begin().
	* testsuite/24_iterators/range_operations/next.cc: Likewise.
	* testsuite/24_iterators/range_operations/prev.cc: Likewise.
	* testsuite/util/testsuite_iterators.h (__gnu_test::test_range::end):
	Always return a sentinel<I>.

committed Jan 31, 2020

6e5a1963 Browse Files

Fix conditional add LRA failure for amdgcn · b9270938

Fix ICE in testcase gfortran.dg/assumed_rank_bounds_3.f90.

2020-01-31  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/gcn/gcn-valu.md (addv64di3_exec): Allow one '0' in each
	alternative only.

committed Jan 31, 2020

b9270938 Browse Files

Fix TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling. · 828573a5

The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is
only insn size, as advised in e.g. Software Optimization Guide for the
AMD Family 15h Processors [1], section 7.1.2, where it is said:

--quote--
7.1.2 Reduce Instruction SizeOptimization

Reduce the size of instructions when possible.

Rationale

Using smaller instruction sizes improves instruction fetch throughput.
Specific examples include the following:

*In SIMD code, use the single-precision (PS) form of instructions
instead of the double-precision (PD) form. For example, for register
to register moves, MOVAPS achieves the same result as MOVAPD, but uses
one less byte to encode the instruction and has no prefix byte. Other
examples in which single-precision forms can be substituted for
double-precision forms include MOVUPS, MOVNTPS, XORPS, ORPS, ANDPS,
and SHUFPS.
...
--/quote--

Please note that this optimization applies only to non-AVX forms, as
demonstrated by:

   0:   0f 28 c8                movaps %xmm0,%xmm1
   3:   66 0f 28 c8             movapd %xmm0,%xmm1
   7:   c5 f8 28 d1             vmovaps %xmm1,%xmm2
   b:   c5 f9 28 d1             vmovapd %xmm1,%xmm2

Also note that MOVDQA is missing in the above optimization. It is
harmful to substitute MOVDQA with MOVAPS, as it can (and does)
introduce +1 cycle forwarding penalty between FLT (FPA/FPM) and INT
(VALU) FP clusters.

[1] https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf

committed Jan 31, 2020

828573a5 Browse Files

[amdgcn] Scale number of threads/workers with VGPR usage · 5a28e272

2020-01-31  Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* config/gcn/mkoffload.c (process_asm): Add sgpr_count and vgpr_count
	to definition of hsa_kernel_description.  Parse assembly to find SGPR
	and VGPR count of kernel and store in hsa_kernel_description.

	libgomp/
	* plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count
	and vgpr_count fields.
	(struct kernel_info): Add a field for a hsa_kernel_description.
	(run_kernel): Reduce the number of threads/workers if the requested
	number would require too many VGPRs.
	(init_basic_kernel_info): Initialize description field with
	the hsa_kernel_description entry for the kernel.

committed Jan 31, 2020

5a28e272 Browse Files

[Fortran] Disable front-end optimization for OpenACC atomic (PR93462) · 6a97d9ea

        PR fortran/93462
        * frontend-passes.c (gfc_code_walker): For EXEC_OACC_ATOMIC, set
        in_omp_atomic to true prevent front-end optimization.

        PR fortran/93462
        * gfortran.dg/goacc/atomic-1.f90: New.

committed Jan 31, 2020

6a97d9ea Browse Files

middle-end: Fix logical shift truncation (PR rtl-optimization/91838) · e60b1e23

This fixes a fall-out from a patch I had submitted two years ago which started
allowing simplify-rtx to fold logical right shifts by offsets a followed by b
into >> (a + b).

However this can generate inefficient code when the resulting shift count ends
up being the same as the size of the shift mode.  This will create some
undefined behavior on most platforms.

This patch changes to code to truncate to 0 if the shift amount goes out of
range.  Before my older patch this used to happen in combine when it saw the
two shifts.  However since we combine them here combine never gets a chance to
truncate them.

The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal
with this shift constant but it's better to do the right thing in simplify-rtx.

Note that this doesn't take care of the Arithmetic shift where you could replace
the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it.

gcc/ChangeLog:

	PR rtl-optimization/91838
	* simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
	to truncate if allowed or reject combination.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/91838
	* g++.dg/pr91838.C: New test.

committed Jan 31, 2020

e60b1e23 Browse Files

Fix fast-math-pr55281.c ICE · c63ae7f0

2020-01-31  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-loop-ivopts.c (get_iv): Use sizetype for zero-step.
	(find_inv_vars_cb): Likewise.

committed Jan 31, 2020

c63ae7f0 Browse Files

calls.c: refactor special_function_p for use by analyzer (v2) · 182ce042

This patch refactors some code in special_function_p that checks for
the function being sane to match by name, splitting it out into a new
maybe_special_function_p, and using it it two places in the analyzer.

gcc/analyzer/ChangeLog:
	* analyzer.cc (is_named_call_p): Replace tests for fndecl being
	extern at file scope and having a non-NULL DECL_NAME with a call
	to maybe_special_function_p.
	* function-set.cc (function_set::contains_decl_p): Add call to
	maybe_special_function_p.

gcc/ChangeLog:
	* calls.c (special_function_p): Split out the check for DECL_NAME
	being non-NULL and fndecl being extern at file scope into a
	new maybe_special_function_p and call it.  Drop check for fndecl
	being non-NULL that was after a usage of DECL_NAME (fndecl).
	* tree.h (maybe_special_function_p): New inline function.

committed Jan 31, 2020

182ce042 Browse Files

analyzer: further fixes for comparisons between uncomparable types (PR 93450) · 45eb3e49

gcc/analyzer/ChangeLog:
	PR analyzer/93450
	* constraint-manager.cc
	(constraint_manager::get_or_add_equiv_class): Only compare constants
	if their types are compatible.
	* region-model.cc (constant_svalue::eval_condition): Replace check
	for identical types with call to types_compatible_p.

committed Jan 31, 2020

45eb3e49 Browse Files

Zero-initialise masked load destinations · 95607c12

Fixes an execution failure in testcase gfortran.dg/assumed_rank_1.f90.

2020-01-30  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/gcn/gcn-valu.md (gather<mode>_exec): Move contents ...
	(mask_gather_load<mode>): ... here, and zero-initialize the
	destination.
	(maskload<mode>di): Zero-initialize the destination.
	* config/gcn/gcn.c:

committed Jan 31, 2020

95607c12 Browse Files

analyzer: add extrinsic_state::dump · 42f36563

gcc/analyzer/ChangeLog:
	* program-state.cc (extrinsic_state::dump_to_pp): New.
	(extrinsic_state::dump_to_file): New.
	(extrinsic_state::dump): New.
	* program-state.h (extrinsic_state::dump_to_pp): New decl.
	(extrinsic_state::dump_to_file): New decl.
	(extrinsic_state::dump): New decl.
	* sm.cc: Include "pretty-print.h".
	(state_machine::dump_to_pp): New.
	* sm.h (state_machine::dump_to_pp): New decl.

committed Jan 30, 2020

42f36563 Browse Files

analyzer: make extrinsic_state field private · ebe9174e

gcc/analyzer/ChangeLog:
	* diagnostic-manager.cc (for_each_state_change): Use
	extrinsic_state::get_num_checkers rather than accessing m_checkers
	directly.
	* program-state.cc (program_state::program_state): Likewise.
	* program-state.h (extrinsic_state::m_checkers): Make private.

committed Jan 30, 2020

ebe9174e Browse Files

Daily bump. · bba54d62
GCC Administrator committed Jan 31, 2020

bba54d62 Browse Files