codegen_cuda.h
3.96 KB
-
[CodeGen][CUDA] Fix bugs (#5209) · 316ce055
- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>
Wei Pan committed