src/target · 0b2d11a5745779ec139a05e8ece73c93fa6d7db8 · wenyuanbo / tic

[CodeGen][CUDA] Fix issues in cuda codegen (#4876) · d50ba721

- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed Feb 15, 2020

d50ba721

Name	Last commit	Last update
..
datatype		Loading commit data...
llvm		Loading commit data...
opt		Loading commit data...
source		Loading commit data...
spirv		Loading commit data...
stackvm		Loading commit data...
build_common.h		Loading commit data...
codegen.cc		Loading commit data...
generic_func.cc		Loading commit data...
intrin_rule.cc		Loading commit data...
intrin_rule.h		Loading commit data...
target.cc		Loading commit data...
target_info.cc		Loading commit data...