- Do not emit __shared__ etc. as part of type for casting - Fix fp16 reduction kernels with compiler errors: "no operator "+" matches these operands, volatile half + volatile half This patch inserts casts to remove volatile type qualifier following volatile loads (fp16 only). CUDA fp16 library headers should add volatile member functions. - Update have_fp16 to include compute 6.1 GPUs, which do support fp16, although their fp16 throughput is low. Updated tests. Signed-off-by: Wei Pan <weip@nvidia.com>
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
literal | Loading commit data... | |
codegen_aocl.cc | Loading commit data... | |
codegen_c.cc | Loading commit data... | |
codegen_c.h | Loading commit data... | |
codegen_c_host.cc | Loading commit data... | |
codegen_c_host.h | Loading commit data... | |
codegen_cuda.cc | Loading commit data... | |
codegen_cuda.h | Loading commit data... | |
codegen_metal.cc | Loading commit data... | |
codegen_metal.h | Loading commit data... | |
codegen_opencl.cc | Loading commit data... | |
codegen_opencl.h | Loading commit data... | |
codegen_opengl.cc | Loading commit data... | |
codegen_opengl.h | Loading commit data... | |
codegen_source_base.cc | Loading commit data... | |
codegen_source_base.h | Loading commit data... | |
codegen_vhls.cc | Loading commit data... | |
codegen_vhls.h | Loading commit data... | |
intrin_rule_aocl.cc | Loading commit data... | |
intrin_rule_cuda.cc | Loading commit data... | |
intrin_rule_metal.cc | Loading commit data... | |
intrin_rule_opencl.cc | Loading commit data... | |
intrin_rule_opengl.cc | Loading commit data... | |
intrin_rule_vhls.cc | Loading commit data... | |
source_module.cc | Loading commit data... |