topi · e7be8bf43de4c1b19ea68134812ea7b0cd8e361f · wenyuanbo / tic

[CodeGen][CUDA] Fix issues in cuda codegen (#4876) · d50ba721

- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed Feb 15, 2020

d50ba721

Name	Last commit	Last update
..
include/topi		Loading commit data...
python		Loading commit data...
recipe		Loading commit data...
src		Loading commit data...
tests/python		Loading commit data...
README.md		Loading commit data...

README.md