Commit 88a4654d by Cesar Philippidis Committed by Tom de Vries

[libgomp, nvptx] Add error with recompilation hint for launch failure

Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure.  This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
M1200.

This patch detects this situation, and errors out with a hint on how to fix it.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-07-26  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
	sufficient resources to launch a kernel, and give a hint on how to fix
	it.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r262997
parent 0c6c2f5f
2018-07-26 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
sufficient resources to launch a kernel, and give a hint on how to fix
it.
2018-07-26 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (struct ptx_device): Add warp_size,
max_threads_per_block and max_threads_per_multiprocessor fields.
(nvptx_open_device): Initialize new fields.
......
......@@ -1204,6 +1204,21 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
dims[i] = default_dims[i];
}
/* Check if the accelerator has sufficient hardware resources to
launch the offloaded kernel. */
if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
> targ_fn->max_threads_per_block)
{
int suggest_workers
= targ_fn->max_threads_per_block / dims[GOMP_DIM_VECTOR];
GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
" launch '%s' with num_workers = %d; recompile the"
" program with 'num_workers = %d' on that offloaded"
" region or '-fopenacc-dim=:%d'",
targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
suggest_workers, suggest_workers);
}
/* This reserves a chunk of a pre-allocated page of memory mapped on both
the host and the device. HP is a host pointer to the new chunk, and DP is
the corresponding device pointer. */
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment