Commits · fd6d7837ed05661e81358045adb902772a4f82c3 · wenyuanbo / tic

16 Feb, 2020 1 commit

[CodeGen][CUDA] Fix issues in cuda codegen (#4876) · d50ba721

- Do not emit __shared__ etc. as part of type for casting

- Fix fp16 reduction kernels with compiler errors:

  "no operator "+" matches these operands, volatile half + volatile half

  This patch inserts casts to remove volatile type qualifier following
  volatile loads (fp16 only). CUDA fp16 library headers should add
  volatile member functions.

- Update have_fp16 to include compute 6.1 GPUs, which do support fp16,
  although their fp16 throughput is low. Updated tests.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed 4 years ago

d50ba721 Browse File

04 Feb, 2020 1 commit
- [LINT] Fix -Wextra (#4804) · 6f7d6fa4
```
* [LINT] Fix -Wextra

* Fix virtual-dtor
```
  Tianqi Chen committed 5 years ago
  6f7d6fa4 Browse File
19 Jan, 2020 2 commits

[REFACTOR][CODEGEN] codegen->target, build_module->driver (#4742) · 33b0831c

This PR moves the codegen related code into the target folder,
as they are target specific functionalities.

We also adopt the term "compiler driver" in common compiler infra
such as rust, GHC and clang.
As a result, build_module is moved into the driver folder.

committed 5 years ago

33b0831c Browse File

[REFACTOR] Establish tir (#4740) · cf59b206

TIR is the new namespace for low-level IR
for tensor-level optimizations and loop transformations.

This PR establishes the namespace and files.

- lowered_func.h,buffer.h,data_layout.h -> tir/buffer.h,tir/data_layout.h,tir/lowered_func.h
- ir.h -> tir/expr.h, tir/stmt.h
- ir_functor_ext.h -> tir/expr_functor.h, tir/stmt_functor.h

committed 5 years ago

cf59b206 Browse Directory

18 Jan, 2020 1 commit

[CodeGen][CUDA] Improve CUDA vectorizer (#4736) · 2630ffcb

- Fixes issues to enable fp16 vectorizer. Now correct packing and
  unpacking CUDA code will be emitted. Enabled more unit tests.

- Do not emit code to read the first lane from an undef variable

  int _3;
  _3 = _3 & ~(0x000000ff << 0) | ...

  and emit the following code instead:

  _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0);

  Note that nvcc 10.2 is forgiving and emits the same code for both cases.
  A warning appears in test_codegen_cuda.py.

Signed-off-by: Wei Pan <weip@nvidia.com>

committed 5 years ago

2630ffcb Browse Directory

17 Jan, 2020 1 commit

[REFACTOR] Get rid of packed_func_ext. (#4735) · 2f8a01f7

Move the conversion extensions to the specific class definitions
so that we longer need to include packed_func_ext.

committed 5 years ago

2f8a01f7 Browse Directory

15 Jan, 2020 1 commit

[REFACTOR] Move support related code to include/tvm/support (#4716) · 49d31443

* [REFACTOR] Move support related code to include/tvm/support

- tvm/logging.h -> tvm/support/logging.h
- remove tvm/base.h, move with into tvm/support/with.h

* src/common -> src/support

committed 5 years ago

49d31443 Browse Directory

09 Jan, 2020 1 commit

[REFACTOR][IR] tvm::Expr -> PrimExpr(Primitive Expr) (#4669) · d6a23cf5

* [REFACTOR][IR] tvm::Expr -> PrimExpr(Primitive Expr)

As part of unified IR, we will need to unify relay::Expr
and the current tvm::Expr under the same base type.

From the techinical point of view. tvm::Expr is a "primitive"
expression that only contains POD types and handles and does
not do life-cycle management.

This PR renames Expr->PrimExpr to clarify that.
We will send a subsequent PR to introduce the base expr class.

* Remove legacy VarExpr and ExprHash/Equal

committed 5 years ago

d6a23cf5 Browse Directory

08 Jan, 2020 1 commit

[REFACTOR][IR] Add Node suffix to low-level IR nodes (#4649) · f4c5f93b

* [REFACTOR][IR] Variable -> VarNode

* [REFACTOR][IR] Add/Sub/Mul/Div -> AddNode/SubNode etc.

* [REFACTOR][IR] Min/Max/FloorDiv/FloorMod -> MinNode/MaxNode etc.

* [REFACTOR][IR] EQ/NE/LT/LE/GT/GE/Select -> EQNode/NENode etc.

* [REFACTOR][IR] Add Node suffix to Select/Call/Load/Ramp/Shuffle/Let

* [REFACTOR][IR] Add node suffix to IntImm/UIntImm/FloatImm/StringImm

* [REFACTOR][IR] Add Node suffix to Any, AttrStmt, AssertStmt

* [REFACTOR][IR] Add Node suffix to Store/Provide/Allocate/Free

* [REFACTOR][IR] Add Node suffix to ProducerConsumer

* Fix lint

* style updates, test fixes

committed 5 years ago

f4c5f93b Browse Directory

22 Dec, 2019 1 commit

[REFACTOR][DTYPE] Isolate dtype to runtime (#4560) · 7fa8aab5

dtype.h -> runtime/data_type.h

Changes:
- Rename all old reference of tvm::Type to DataType
- ExprNode.type -> ExprNode.dtype
- Expr.type() -> Expr.dtype()
- Change Expr related functions to expr_operator.
  - DataType::min() -> min_value(DataType)
  - DataType::max() -> max_value(DataType)
- Move type constructor Int, UInt, Float, Handle, Bool into DataType.
  - Int(bits) -> DataType::Int(bits)
  - UInt(bits) -> DataType::UInt(bits)

committed 5 years ago

7fa8aab5 Browse Directory

24 Nov, 2019 1 commit

[LINT] Remove unnecessary copyright message for files with ASF header (#4409) · c8772288

* [LINT] Improve the check tool to handle ASF copyright message.

* [LINT] Remove unnecessary copyright message as per ASF requirement.

* Fix codegen hybrid

* [LINT] Broaden license checks to include html, xml

* [LINT] Fix rest of the files

* Fix notice

* [LINT] Improve check file type error message

committed 5 years ago

c8772288 Browse Directory

14 Nov, 2019 1 commit
- [Codegen] remove fp16 function override for cuda (#4331) · cf83d50c
```
* add volatile override back

* [codegen] remove fp16 function override for cuda
```
  Yizhi Liu committed 5 years ago
  cf83d50c Browse Directory
10 Nov, 2019 1 commit
- [Codegen][cuda-fp16] fallback to fp32 simulation when cuda arch < sm53 (#4268) · 801cf0e8
  Yizhi Liu committed 5 years ago
  
  801cf0e8 Browse Directory
31 Oct, 2019 1 commit
- [CUDA] Fix fp16 intrin, disable bad fp16 vecadd test for now (#4239) · ebfcd28c
  Tianqi Chen committed 5 years ago
  
  ebfcd28c Browse Directory
25 Oct, 2019 1 commit
- [hotfix] missing include headers (#4204) · 7732873e
  Zhi committed 5 years ago
  
  7732873e Browse Directory
24 Oct, 2019 1 commit

TensorCore Support using Intrinsic (#4136) · 324a9607

* add tensor core support

* avoid memory bank conflict

* fix thread sync & better performance

* better performance

* add schedule test for conv2d

* extend into BatchMatMul

* support config fragment shape and layout using intrinsic

* add TensorCore tutorial

* add int support and fix lint

* address comment

* add 32*16*8 TensorCore test

* fix wmma include logic

committed 5 years ago

324a9607 Browse Directory

11 Oct, 2019 1 commit

[codegen] Add multiple operands and function support when using fp16 compilation (#4056) · ce72e9b5

* overload half operators for cuda codegen

* add float16 te test_op_level1

* fix test_op_level1.py

* fix lint

* disable fp16 test if gpu does not support

* disable fp16 test if gpu does not support

* bypass float16 test if gpu does not support float16

committed 5 years ago

ce72e9b5 Browse Directory

13 Sep, 2019 1 commit

Fix CUDA int8x4 vectorize (#3928) · 195973c0

* Fix int8x4 vectorize

* Fix gpu shared/local memory accumulate

* Add test_shared_memory for int8x4

* Adjust test format

* Fix cpplint

committed 5 years ago

195973c0 Browse Directory

01 Aug, 2019 1 commit
- Add shuffle support to TVM (#3633) · a279dd0e
  Jian Weng committed 5 years ago
  
  a279dd0e Browse Directory
06 Jul, 2019 1 commit
- [ARITH] Refactor: Remove un-necessary usage of ComputeExpr (#3503) · 59448fed
  Tianqi Chen committed 5 years ago
  
  59448fed Browse Directory
17 May, 2019 1 commit
- [CODEGEN][CUDA][OPENCL] Handle INF and NAN (#3194) · 24fe04f8
  lixiaoquan committed 5 years ago
  
  24fe04f8 Browse Directory
08 Apr, 2019 1 commit

[HEADER] Add Header to Comply with ASF Release Policy (#2982) · cffb4fba

* [HEADER] ASF header dir=include

* [HEADER] ASF Header dir=src

* [HEADER] ASF Header -dir=python

* [HEADER] ASF header dir=topi

* [HEADER] ASF Header dir=nnvm

* [HEADER] ASF Header -dir=tutorials

* [HEADER] ASF Header dir=tests

* [HEADER] ASF Header -dir=docker

* fix whitespace

* [HEADER] ASF Header -dir=jvm

* [HEADER] ASF Header -dir=web

* [HEADER] ASF Header --dir=apps

* [HEADER] ASF Header --dir=vta

* [HEADER] ASF Header -dir=go

* temp

* [HEADER] ASF Header --dir=rust

* [HEADER] Add ASF Header --dir=cmake

* [HEADER] ASF Header --dir=docs

* [HEADER] Header for Jenkinsfile

* [HEADER] ASF Header to toml and md

* [HEADER] ASF Header to gradle

* Finalize rat cleanup

* Fix permission

* Fix java test

* temporary remove nnvm onnx test

committed 5 years ago

cffb4fba Browse Directory

24 Oct, 2018 1 commit
- Fix int8x4 broadcast value codegen in cuda (#1959) · 155e955f
  Wuwei Lin committed 6 years ago
  
  155e955f Browse Directory
07 Oct, 2018 1 commit
- Enable bool type as storage type (#1853) · f1d815cc
  Tianqi Chen committed 6 years ago
  
  f1d815cc Browse Directory
01 Oct, 2018 1 commit
- [IR] eager constant folding in operator overloading (#1789) · 32af4d28
  Tianqi Chen committed 6 years ago
  
  32af4d28 Browse Directory
23 Aug, 2018 1 commit
- Remove leading "./" from include paths (#1640) · b95b5958
  MORITA Kazutaka committed 6 years ago
  
  b95b5958 Browse Directory
09 Aug, 2018 1 commit
- Use int for int8x4 due to performance overhead of char4 (#1569) · 41d4dd6e
```
* Use int for int8x4 due to performance overhead of char4

* Add a comment about using int

* Remove invalid test
```
  Wuwei Lin committed 6 years ago
  41d4dd6e Browse Directory
01 Aug, 2018 1 commit
- [TVM][CUDA] NVIDIA GPU Int8 Support (#1503) · 146ebc5e
  Tatsuya Nishiyama committed 6 years ago
  
  146ebc5e Browse Directory
20 Jul, 2018 1 commit
- [CUDA] FP16 support (#1413) · 30409045
  Tatsuya Nishiyama committed 6 years ago
  
  30409045 Browse Directory
11 Jun, 2018 1 commit
- [BUILD] Switch to CMake only Infra (#1254) · 2d3031ee
  Tianqi Chen committed 6 years ago
  
  2d3031ee Browse Directory
24 Dec, 2017 1 commit
- [CODEGEN] update codegen for vector operation (#711) · 5d37be62
```
* [CODEGEN] update codegen for vector operation

* update comment, fix for metal
```
  Lianmin Zheng committed 7 years ago
  5d37be62 Browse Directory
11 Dec, 2017 2 commits
- Fix long for windows in cuda (#700) · 154959b1
```
* Use long long for platforms where long is 32 bits (like windows).

* Make sure scalar chars are signed.

* Re-add NOLINT marker.
```
  abergeron committed 7 years ago
  154959b1 Browse Directory
- [CODEGEN] add fp16 and fp64 enable pragma for opencl (#697) · 3dfb8459
```
* [CODEGEN] add fp16 and fp64 enable pragma for opencl

* fix style
```
  Lianmin Zheng committed 7 years ago
  3dfb8459 Browse Directory
30 Nov, 2017 1 commit
- [CUDA] Enable int64 (#683) · cf81f9f9
```
* [CUDA] Enable int64

* [PYTHON] Fix rpc tutorial with opencl

* OK

* update
```
  Tianqi Chen committed 7 years ago
  cf81f9f9 Browse Directory
06 Jul, 2017 1 commit
- [CODEGEN/PASS] add restricted, alignment option (#221) · 0a19b16a
```
* [CODEGEN/PASS] add restricted, alignment option

* fix lint

* Fix the alloca
```
  Tianqi Chen committed 7 years ago
  0a19b16a Browse Directory
03 Jul, 2017 1 commit
- [CODEGEN] Concise typecast for threadIdx (#208) · b0e41b9a
  Tianqi Chen committed 7 years ago
  
  b0e41b9a Browse Directory
02 Jun, 2017 1 commit
- [PASS] Refactor build config, allow implicit unroll pragma (#167) · 46b4a914
  Tianqi Chen committed 7 years ago
  
  46b4a914 Browse Directory
02 May, 2017 1 commit
- [CODEGEN/RUNTIME] Metal support, runtime improvement. (#111) · 706f9b6f
```
* [CODEGEN/RUNTIME] Metal support, runtime improvement.

* Fix case when no device is available
```
  Tianqi Chen committed 7 years ago
  706f9b6f Browse Directory
22 Apr, 2017 1 commit
- [LANG/CODEGEN] Intrinsics and Extern Math (#101) · d17b10f0
```
* [LANG/CODEGEN] Intrinsics and Extern Math

* fix lint
```
  Tianqi Chen committed 7 years ago
  d17b10f0 Browse Directory
15 Apr, 2017 1 commit
- [PERF] Persitent kernel (#87) · 4d280905
```
* [PERF] Persitent kernel

* fix doc
```
  Tianqi Chen committed 7 years ago
  4d280905 Browse Directory