src · 2630ffcbc52973aaf86fd6b7000a6f2f30d5f25c · wenyuanbo / tic

[CodeGen][CUDA] Improve CUDA vectorizer (#4736) · 2630ffcb

- Fixes issues to enable fp16 vectorizer. Now correct packing and
  unpacking CUDA code will be emitted. Enabled more unit tests.

- Do not emit code to read the first lane from an undef variable

  int _3;
  _3 = _3 & ~(0x000000ff << 0) | ...

  and emit the following code instead:

  _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0);

  Note that nvcc 10.2 is forgiving and emits the same code for both cases.
  A warning appears in test_codegen_cuda.py.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed Jan 17, 2020

2630ffcb

Name	Last commit	Last update
..
api		Loading commit data...
arith		Loading commit data...
autotvm		Loading commit data...
codegen		Loading commit data...
contrib/hybrid		Loading commit data...
ir		Loading commit data...
lang		Loading commit data...
node		Loading commit data...
pass		Loading commit data...
relay		Loading commit data...
runtime		Loading commit data...
support		Loading commit data...
target		Loading commit data...
top		Loading commit data...
README.md		Loading commit data...

README.md