- Fixes issues to enable fp16 vectorizer. Now correct packing and unpacking CUDA code will be emitted. Enabled more unit tests. - Do not emit code to read the first lane from an undef variable int _3; _3 = _3 & ~(0x000000ff << 0) | ... and emit the following code instead: _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0); Note that nvcc 10.2 is forgiving and emits the same code for both cases. A warning appears in test_codegen_cuda.py. Signed-off-by: Wei Pan <weip@nvidia.com>
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
api | Loading commit data... | |
arith | Loading commit data... | |
autotvm | Loading commit data... | |
codegen | Loading commit data... | |
contrib/hybrid | Loading commit data... | |
ir | Loading commit data... | |
lang | Loading commit data... | |
node | Loading commit data... | |
pass | Loading commit data... | |
relay | Loading commit data... | |
runtime | Loading commit data... | |
support | Loading commit data... | |
target | Loading commit data... | |
top | Loading commit data... | |
README.md | Loading commit data... |