Commits · 9cb9a51f37eaa9c7692f15f8c5ae52fa70394209 · wenyuanbo / tic

22 Mar, 2020 1 commit

[CodeGen][CUDA] Vectorization for intrinsics (#5101) · 05b0f7e0

- This allows to emit vectorized loads/stores
  for CUDA math intrinsics.

- A few intrinsics should be lowered as CUDAMath not CUDAFastMath ones.

- Fixed the code block identation.

committed 4 years ago

05b0f7e0 Browse Directory

20 Mar, 2020 1 commit

[TIR][TARGET] Refactor Target codegen to use IRModule and PrimFunc. (#5107) · 841725cc

As part of the unified IR refactor.
This PR refactors the target codegen to use IRModule containing tir::PrimFuncs.

In order to break the refactor into several steps without breaking the codebase,
we built an conversion pass to convert Array<LoweredFunc> into IRModule.

The follow-up refactors will gradually move the passes covered by IRModule up
until we cover all the passes. Then we can remove the additional redundant
concepts such as LoweredFunc.

committed 4 years ago

841725cc Browse Directory

18 Mar, 2020 1 commit
- [CODEGEN][OPENCL] Explicitly cast min/max operands (#5090) · c3b89b76
```
* [CODEGEN][OPENCL] Explicitly cast min/max operands

* retrigger CI
```
  MORITA Kazutaka committed 4 years ago
  c3b89b76 Browse Directory
11 Mar, 2020 3 commits

[Intrin] Adding a few missing math intrin (#5011) · a6cb4b8d
```
* [intrin] exp2

* [intrin] exp10

* [intrin] log2/10

* [intrins] exp10

* [test] math intrin
```
Bing Xu committed 4 years ago
a6cb4b8d Browse Directory

[CodeGen][CUDA] Enhance CUDA codegen for SelectNode (#4983) · afa84171

- This patch allows CUDA backend to emit correct code for
  selects with vector conditions, which may be produced
  by floordiv op lowering etc..

- This already works for llvm BE, as llvm select instruction
  supports vector conditions.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed 4 years ago

afa84171 Browse Directory

[topi][relay] new PR to re-add tan to TVM (#5025) · 45ee7b5f

* Add relay operation relay.op.tan.

* Update tan implementation in TVM.

* Update tests.

* Add shape function for tan.

* Add missing main test to python/frontend/tensorflow/test_forward.

* Revert, back to sin/cos.

* Revert "Revert, back to sin/cos."

This reverts commit 4da5b503b921585ba9d80944b29136142b575c40.

* Fix implementation of tan in cuda. Do not support tan for float16.

Simplify topi/tests/python/test_topi_math. Add testing for tan with float32 and float64.

Finally implement tan as sin/cos in llvm.

committed 4 years ago

45ee7b5f Browse Directory

10 Mar, 2020 1 commit
- Revert "[topi][relay] add operation tan to TVM (#4938)" (#5017) · c0bc1882
```
This reverts commit d992468d.
```
  Yao Wang committed 4 years ago
  c0bc1882 Browse Directory
06 Mar, 2020 1 commit

[topi][relay] add operation tan to TVM (#4938) · d992468d

* Add relay operation relay.op.tan.

* Update tan implementation in TVM.

* Update tests.

* Add shape function for tan.

* Add missing main test to python/frontend/tensorflow/test_forward.

* Revert, back to sin/cos.

* Revert "Revert, back to sin/cos."

This reverts commit 4da5b503b921585ba9d80944b29136142b575c40.

* Fix implementation of tan in cuda. Do not support tan for float16.

Simplify topi/tests/python/test_topi_math. Add testing for tan with float32 and float64.

Try again to implement tan as sin/cos in llvm.

committed 4 years ago

d992468d Browse Directory

21 Feb, 2020 1 commit

[CODEGEN] Support cuda tensorcore subbyte int data type in auto tensorcore (#4546) · f23ac969

* support cuda tensorcore subbyte int data type in auto tensorcore

* add lisence

* pass cpplint

* fix code review comments

* merge the int4/int1 codegen tutorial into the existing auto tensorcore tutorial

* using master's new API

* disable tuning when cuda is not enabled

* address cr comment

* do not run the tuning

* fix test failure

* fix cpplint error

* fix bool type reduction bug

* 1. fix a index bug 2. fix returned bytes value of int1/int4/uint4

* fix typo

committed 4 years ago

f23ac969 Browse Directory

16 Feb, 2020 1 commit

[CodeGen][CUDA] Fix issues in cuda codegen (#4876) · d50ba721

- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed 4 years ago

d50ba721 Browse Directory

07 Feb, 2020 1 commit

[REFACTOR][PY][API-Change] Polish tvm.runtime, tvm.runtime.module API update (#4837) · e0122c0e

* [REFACTOR][PY-API] Polish tvm.runtime, tvm.runtime.module API update

This PR updates the tvm.runtime to use the new FFI style.

- Remove top-level tvm.module to avoid confusion between runtime.Module and IRModule
- API changes wrt to runtime.Module
  - tvm.module.load -> tvm.runtime.load_module
  - tvm.module.enabled -> tvm.runtime.enabled
  - tvm.module.system_lib -> tvm.runtime.system_lib
- Remove dep on api_internal from runtime.

* Update module.load in the latest API

committed 4 years ago

e0122c0e Browse Directory

04 Feb, 2020 1 commit
- [LINT] Fix -Wextra (#4804) · 6f7d6fa4
```
* [LINT] Fix -Wextra

* Fix virtual-dtor
```
  Tianqi Chen committed 4 years ago
  6f7d6fa4 Browse Directory
19 Jan, 2020 1 commit

[REFACTOR][CODEGEN] codegen->target, build_module->driver (#4742) · 33b0831c

This PR moves the codegen related code into the target folder,
as they are target specific functionalities.

We also adopt the term "compiler driver" in common compiler infra
such as rust, GHC and clang.
As a result, build_module is moved into the driver folder.

committed 5 years ago

33b0831c Browse Directory