Name |
Last commit
|
Last update |
---|---|---|
.. | ||
micro/standalone | ||
c_backend_api.h | ||
c_runtime_api.h | ||
container.h | ||
data_type.h | ||
device_api.h | ||
memory.h | ||
module.h | ||
ndarray.h | ||
object.h | ||
packed_func.h | ||
registry.h | ||
serializer.h | ||
threading_backend.h | ||
vm.h |
- Fixes issues to enable fp16 vectorizer. Now correct packing and unpacking CUDA code will be emitted. Enabled more unit tests. - Do not emit code to read the first lane from an undef variable int _3; _3 = _3 & ~(0x000000ff << 0) | ... and emit the following code instead: _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0); Note that nvcc 10.2 is forgiving and emits the same code for both cases. A warning appears in test_codegen_cuda.py. Signed-off-by: Wei Pan <weip@nvidia.com>
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
micro/standalone | Loading commit data... | |
c_backend_api.h | Loading commit data... | |
c_runtime_api.h | Loading commit data... | |
container.h | Loading commit data... | |
data_type.h | Loading commit data... | |
device_api.h | Loading commit data... | |
memory.h | Loading commit data... | |
module.h | Loading commit data... | |
ndarray.h | Loading commit data... | |
object.h | Loading commit data... | |
packed_func.h | Loading commit data... | |
registry.h | Loading commit data... | |
serializer.h | Loading commit data... | |
threading_backend.h | Loading commit data... | |
vm.h | Loading commit data... |