Add workgroup size attribute to AMDGPU functions in codegen (#4342)
When we did not set the workgroup size, LLVM will use too many registers for kernel launches with many threads. This resulted in "invalid ISA" errors. Here we set the maximum workgroup size to the maximum threads per block from the device API. Of course, one might look into allowing configurations with fewer threads at runtime to use more registers.
Showing
Please
register
or
sign in
to comment