topi/tests/python/test_topi_tensor.py · 7013fc9abd76dfa33d52d6131a14a0c8af48929c · wenyuanbo / tic

[TOPI][CUDA] Enable vectorization on fp16 type (#4867) · 7013fc9a

- This allows to better utilize the memory bandwidth

- Note that not all cases are vectorized for fp16 datatype. For
  instance, when the size is not a multiple of 1024, the inner loop
  may be an expression that cannot be vectorized. In this case, a
  small inner loop is still benefical for latency hidding.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed Feb 13, 2020

7013fc9a

test_topi_tensor.py 4.76 KB