| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| datatype | ||
| llvm | ||
| opt | ||
| source | ||
| spirv | ||
| stackvm | ||
| build_common.h | ||
| codegen.cc | ||
| generic_func.cc | ||
| intrin_rule.cc | ||
| intrin_rule.h | ||
| target.cc | ||
| target_info.cc |
- Do not emit __shared__ etc. as part of type for casting - Fix fp16 reduction kernels with compiler errors: "no operator "+" matches these operands, volatile half + volatile half This patch inserts casts to remove volatile type qualifier following volatile loads (fp16 only). CUDA fp16 library headers should add volatile member functions. - Update have_fp16 to include compute 6.1 GPUs, which do support fp16, although their fp16 throughput is low. Updated tests. Signed-off-by: Wei Pan <weip@nvidia.com>
| Name |
Last commit
|
Last update |
|---|---|---|
| .. | ||
| datatype | Loading commit data... | |
| llvm | Loading commit data... | |
| opt | Loading commit data... | |
| source | Loading commit data... | |
| spirv | Loading commit data... | |
| stackvm | Loading commit data... | |
| build_common.h | Loading commit data... | |
| codegen.cc | Loading commit data... | |
| generic_func.cc | Loading commit data... | |
| intrin_rule.cc | Loading commit data... | |
| intrin_rule.h | Loading commit data... | |
| target.cc | Loading commit data... | |
| target_info.cc | Loading commit data... |