src · d50ba721eb5f7c0dbeceeaa78335d6f4c8cf2973 · wenyuanbo / tic

[CodeGen][CUDA] Fix issues in cuda codegen (#4876) · d50ba721

- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed Feb 15, 2020

d50ba721

Name	Last commit	Last update
..
api		Loading commit data...
arith		Loading commit data...
autotvm		Loading commit data...
contrib/hybrid		Loading commit data...
driver		Loading commit data...
ir		Loading commit data...
node		Loading commit data...
printer		Loading commit data...
relay		Loading commit data...
runtime		Loading commit data...
support		Loading commit data...
target		Loading commit data...
te		Loading commit data...
tir		Loading commit data...
README.md		Loading commit data...

README.md