Commits · 8ad36a1763faf84bbb7405a3f335df989b9d3844 · wenyuanbo / tic

02 Aug, 2019 2 commits

[AutoTVM] Fix hang/crash issues on feature extraction (#3689) · 8ad36a17
```
* [AutoTVM] Fix hang/crash issues on feature extraction

* Update xgboost_cost_model.py

* fix lint
```
Lianmin Zheng committed 5 years ago
8ad36a17 Browse Directory

[Relay][Quantization] KL-divergence-based per-layer calibration (#3538) · 33ab3c60

* [Relay][Quantization] Support floating-point scale

* [Relay][Quantization] KL-divergence calibration on dataset

* Fix unhandled LeftShift case in QuantizeRealize

* Fix lint

* drop QBias

* fix lint

* address comments

* address comments

* Update comments

* address comments

* lint

* kQIdentity = 0

committed 5 years ago

33ab3c60 Browse Directory

01 Aug, 2019 4 commits

[Relay][VM] Support execution on devices (#3678) · 5357f49b

* [Relay][VM] Support execution on devices

* Reduce Copy calls

* Cleanup

* Lint

* CR comments

* Merge test into test_vm.py

committed 5 years ago

5357f49b Browse Directory

Add shuffle support to TVM (#3633) · a279dd0e
Jian Weng committed 5 years ago

a279dd0e Browse Directory

Add support for Tensorflow operators log1p, cos, sin (#3614) · d72cdfa6

The patch adds support for Tensorflow operators log1p and cos
Tensorflow log1p is described at https://www.tensorflow.org/api_docs/python/tf/math/log1p
Tensorflow cos is described at https://www.tensorflow.org/api_docs/python/tf/math/cos
Tensorflow sin is described at https://www.tensorflow.org/api_docs/python/tf/math/sin

committed 5 years ago

d72cdfa6 Browse Directory

[Relay] Strict mode in pattern matching (#3620) · 331585f4

* add fatal

lint

lint

lint

do

make completeness check an error

lint

remove fatal

* fix test

* reset parser file

* remove unneeded import

* Update python/tvm/relay/adt.py

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>

* Update include/tvm/relay/adt.h

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>

* Eliminate trailing whitespace (my fault)

committed 5 years ago

331585f4 Browse Directory

31 Jul, 2019 1 commit
- [Relay][VM] Relay VM serialization (#3647) · 90455121
```
* relay vm serialization

* fix lint

* load params, fix stream

* lint

* fix typo
```
  Zhi committed 5 years ago
  90455121 Browse Directory
30 Jul, 2019 2 commits
- ROCm: Add SaveToFile and LoadFile (#3665) · d4a51751
```
...and add rocm module_save to the tests.
```
  Thomas Viehmann committed 5 years ago
  d4a51751 Browse Directory
- Print llvm source by default in ROCMModuleNode::GetSource (#3662) · 52b63b9f
  Thomas Viehmann committed 5 years ago
  
  52b63b9f Browse Directory
25 Jul, 2019 4 commits

[IR] Make iterators compatible with constructors of STL containers (#3624) · 0858c5ad
Lianmin Zheng committed 5 years ago

0858c5ad Browse Directory

Implementation of uTVM (#3227) · ef909df1

* uTVM interfaces (#14)

* some minor interface changes

* implemented HostLowLevelDevice

* added MicroDeviceAPI

* implemented micro_common and added Python interfaces

* current status, semi implemented micro session

* added micro_common implementation and python interfaces (#18)

* added micro_common implementation and python interfaces (#18)

* current status, semi implemented

* host test working

* updated interfaces for MicroSession arguments allocation

* make somewhat lint compatible

* fix based on comments

* added rounding macro

* fix minor bug

* improvements based on comments

* Clean up `binutil.py` and make Python-3-compatible

* Change argument allocation design

* Address feedback and lint errors

* Improve binutil tests

* Simplify allocator (per @tqchen's suggestions)

* Doc/style fixes

* farts

* mcgee

* rodata section werks

(and so does `test_runtime_micro_workspace.py`)

* simple graph runtime werk

* TEMP

* ResNet works, yo

* First round of cleanup

* More cleanup

* runs a dyson over the code

* Another pass

* Fix `make lint` issues

* ready to pr... probably

* final

* Undo change

* Fix rebase resolution

* Minor fixes

* Undo changes to C codegen tests

* Add `obj_path` in `create_micro_lib`

* TEMP

* Address feedback

* Add missing TODO

* Partially address feedback

* Fix headers

* Switch to enum class for `SectionKind`

* Add missing ASF header

* Fix lint

* Fix lint again

* Fix lint

* Kill lint warnings

* Address feedback

* Change Python interface to MicroTVM

All interaction with the device is now through `Session` objects, which
are used through Python's `with` blocks.

* Reorder LowLevelDevice interface

* Store shared ptr to session in all alloced objects

* Move helper functions out of `tvm.micro`

* Switch static char arr to vector

* Improve general infra and code quality

Does not yet address all of tqchen's feedback

* Forgot a rename

* Fix lint

* Add ASF header

* Fix lint

* Partially address MarisaKirisame's feedback

* Lint

* Expose `MicroSession` as a node to Python

* Revert to using `Session` constructor

* Fix compiler error

* (Maybe) fix CI error

* Debugging

* Remove

* Quell lint

* Switch to stack-based session contexts

* Make uTVM less intrusive to host codegen

And use SSA for operands of generated ternary operators

* Inline UTVMArgs into UTVMTask struct

* Remove `HostLowLevelDevice` header

* Remove `BaseAddr` class

* Address feedback

* Add "utvm" prefix to global vars in runtime

* Fix lint

* Fix CI

* Fix `test_binutil.py`

* Fix submodules

* Remove ResNet tests

* Make `test_binutil.py` work with nose

* Fix CI

* I swear this actually fixes the binutil tests

* lint

* lint

* Add fcompile-compatible cross-compile func

* Add docs for uTVM runtime files

* Move pointer patching into `MicroSession`

* Fix lint

* First attempt at unifying cross-compile APIs

* Fix lint

* Rename `cross_compile` back to `cc`

* Address feedback

* Remove commented code

* Lint

* Figure out failing function

* Remove debugging code

* Change "micro_dev" target to "micro"

* Add checks in tests for whether uTVM is enabled

* Add TODO for 32-bit support

* Rename more "micro_dev" to "micro"

* Undo rename

We already have `tvm.micro` as a namespace.  Can't have it as a method
as well.

* Fix failing CI

Thanks to @tqchen for finding this bug.  Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.

* Address feedback

* Fix lint

committed 5 years ago

ef909df1 Browse Directory

Add a missing header in cuda_device_api.cc (#3621) · 443d023b
Philip Hyunsu Cho committed 5 years ago

443d023b Browse Directory
fix typo (#3611) · e7fb2d4d
Jian Weng committed 5 years ago

e7fb2d4d Browse Directory

24 Jul, 2019 2 commits
- init (#3571) · 814554e0
```
quickfix
```
  雾雨魔理沙 committed 5 years ago
  814554e0 Browse Directory
- [TOPI][Relay] max_pool2d & avg_pool2d gradient (#3601) · 5c410037
  Wuwei Lin committed 5 years ago
  
  5c410037 Browse Directory
23 Jul, 2019 4 commits

We observe multiple groups across a range of domains (ASR, NMT, LM, etc), (#3566) · d6dcd6c5

internally and externally, interested in replacing standard dense layers with
block-sparse matrix multiplication layers. The motivations are generally: higher
performance (due to reduction in FLOPs, memory bandwidth/cache footprint),
enabling larger models (e.g. fitting more layers in a given memory budget).

Some public work along these lines:

* https://openai.com/blog/block-sparse-gpu-kernels/
* https://openai.com/blog/sparse-transformer/
* https://arxiv.org/abs/1802.08435
* https://arxiv.org/abs/1711.02782

Various groups have been able to successfully train models with reasonable
levels of sparsity (90%+) with marginal accuracy changes, which suggests
substantial speedups are possible (as this implies a >10x reduction in FLOPs).

It is fairly straightforward to realize these theoretical speedups, see e.g. TVM
benchmarks for Intel CPUs in
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA
results in https://github.com/openai/blocksparse, etc.

* https://github.com/openai/blocksparse (CUDA)
* https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM)
* https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation)

This is extracted from an internal patch we've been using internally. There are
various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but
this is a reasonable starting point. This needs more thorough unit test coverage
however.

We follow the conventions established by scipy.sparse.bsr_matrix and other
libraries, see the unit tests for details.

For folks interested in experimenting with scheduling/AutoTVM etc,
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful
starting point.

committed 5 years ago

d6dcd6c5 Browse Directory

{relay,topi}.reinterpret support (#3599) · 2ed31b24

= Motivation

It's useful to expose the tvm::reinterpret functionality to Relay/TOPI users, as
this allows them to build (fused) operators leveraging the bitwise
reinterpretation of an operator. An example is approximate transcendental
functions, which can be implemented similar to:

```.py
    def C(x):
        return relay.expr.const(x, "float32")

    def approx_exp(x):
        x = relay.minimum(relay.maximum(x, C(-88.0)), C(88.0))
        x = C(127.0) + x * C(1.44269504)
        xf = relay.floor(x)
        i = relay.cast(xf, "int32")
        x = x - xf
        Y = C(0.99992522) + x * (C(0.69583354) + x * (C(0.22606716) + x * C(0.078024523)))
        exponent = relay.left_shift(i, relay.expr.const(23, "int32"))
        exponent = relay.reinterpret(exponent, "float32")
        return exponent * Y

    def approx_sigmoid(x):
        # <2.0e-5 absolute error over [-5, 5]
        y = approx_exp(x)
        return y / (y + C(1.0))

    def approx_tanh(x):
        # <4.0e-5 absolute error over [-5, 5]
        x = x * C(2.0)
        y = approx_exp(x)
        return (y - C(1.0)) / (y + C(1.0))
```

See unit tests for implementations of these approximate transendentals.

committed 5 years ago

2ed31b24 Browse Directory

[Relay] [Training] Allow gradient to return a tuple (#3600) · 9e6a8c0d
雾雨魔理沙 committed 5 years ago

9e6a8c0d Browse Directory

[Runtime] [ThreadPool] Make SpscTaskQueue::Pop(..) spin_count configurable (#3577) · 9b1c2e08

In cases where we have multiple models or threadpools active, spinning around
`sched_yield()` may not be desirable, as it prevents the OS from effectively
scheduling other threads.

Thus, allow users to conditionally disable this behaviour (via an environment
variable `TVM_THREAD_POOL_SPIN_COUNT`, similar to existing environment flags for
the thread pool such as `TVM_BIND_THREADS`, etc).

This substantially improves tail latencies in some of our multi-tenant
workloads in practice.

Unit tests have been added - on my laptop, running:

```
TVM_THREAD_POOL_SPIN_COUNT=0 ./build/threading_backend_test;
TVM_THREAD_POOL_SPIN_COUNT=1 ./build/threading_backend_test;
./build/threading_backend_test;
```

gives https://gist.github.com/ajtulloch/1805ca6cbaa27f5d442d23f9d0021ce6 (i.e.
97ms -> <1ms after this change)

committed 5 years ago

9b1c2e08 Browse Directory

21 Jul, 2019 1 commit
- [CI] Upgrade LLVM envs (#3590) · 4d314833
  Tianqi Chen committed 5 years ago
  
  4d314833 Browse Directory
19 Jul, 2019 3 commits
- [Relay] add some check for the ad algorithm (#3585) · 1a00cab9
```
* do

* fix test
```
  雾雨魔理沙 committed 5 years ago
  1a00cab9 Browse Directory
- [TOPI][RELAY] Add op Size (#3094) · 313bc9de
  Yong Wu committed 5 years ago
  
  313bc9de Browse Directory
- Add printer for Layout/BijectiveLayout (#3582) · b0481c82
  Yizhi Liu committed 5 years ago
  
  b0481c82 Browse Directory
18 Jul, 2019 4 commits
- [Relay] parser/pretty printer roundtripping (#3536) · 2973f8a6
  雾雨魔理沙 committed 5 years ago
  
  2973f8a6 Browse Directory
- [ARITH] Simplify let (#3568) · e5efc632
  Tianqi Chen committed 5 years ago
  
  e5efc632 Browse Directory
- Emit DWARF debug information (#3420) · d82db909
  Andrew Tulloch committed 5 years ago
  
  d82db909 Browse Directory
- tightening bounding box for IntSet fused in PassUpDomain (#3073) · 54f903a5
```
Apply suggestions from code review

Co-Authored-By: Wei Chen <ipondering.weic@gmail.com>
```
  bulanova-huawei committed 5 years ago
  54f903a5 Browse Directory
17 Jul, 2019 3 commits
- [Relay][VM]Fix debug statement (#3565) · ca432bd5
```
* [Relay][VM]Fix debug statement

* Change debug statement
```
  Wei Chen committed 5 years ago
  ca432bd5 Browse Directory
- Fix build error (#3552) · 48c1ae45
```
* Fix build error

* comments
```
  Yinghai Lu committed 5 years ago
  48c1ae45 Browse Directory
- fix (#3550) · 6d702ea8
  Haichen Shen committed 5 years ago
  
  6d702ea8 Browse Directory
16 Jul, 2019 1 commit

[Relay][VM] Port VM, VM compiler, and Object into python (#3391) · b6dc7826

* tmp

* Port vm and object to python

* clean up

* update vm build module

* update

* x

* tweak

* cleanup

* update

* fix rebase

* Rename to VMCompiler

* fix

committed 5 years ago

b6dc7826 Browse Directory

15 Jul, 2019 1 commit

[Runtime] Enable set_input_zero_copy in GraphRuntime (#3416) · afd4b3e4

* Enable set_input_zero_copy in GraphRuntime

* Fix LoadParams

* Fix

* lint

* Fix remote context issue

* Fix

* Remove LOG

* Remove unused variables

* Add tests

* works

* More test scenarios

* make it simpler

* Remove unnecessary changes

* Address comments

* More comments

* Address comments

* Fix build

committed 5 years ago

afd4b3e4 Browse Directory

14 Jul, 2019 1 commit
- [ARITH][BOUND] Fix bound inference to avoid allocating too much (#3526) · 9fad94cc
```
* [TVM] Fix bound inference to avoid allocating too much

* [ARITH][BOUND] Pass analyzer to PropBoundToInputs
```
  Sergei Grechanik committed 5 years ago
  9fad94cc Browse Directory
13 Jul, 2019 1 commit
- [ARITH][IR] Introduce FloorDiv/Mod (#3479) · 75892d2b
```
* [ARITH][IR] Introduce FloorDiv/Mod

* Address review comments

* address review comments, fix div sub rule
```
  Tianqi Chen committed 5 years ago
  75892d2b Browse Directory
12 Jul, 2019 1 commit
- [Relay][Quantization] Fix add_rewrite and UnifyDTypeScale (#3534) · 45878ff2
```
* [Relay][Quantization] Fix issue introduced in #3135

* Recover StopFusion

* Fix fmultiref

* Fix lint
```
  Wuwei Lin committed 5 years ago
  45878ff2 Browse Directory
11 Jul, 2019 2 commits

[INFA][IR] Build and Evolve Low-level IR. Remove HalideIR dep. (#3533) · 0218557c

* [INFA][IR] Build and Evolve Low-level IR. Remove dep from HalideIR.


* Update include/tvm/node/ir_functor.h

Co-Authored-By: Jared Roesch <roeschinc@gmail.com>

* Update include/tvm/node/ir_functor.h

Co-Authored-By: Jared Roesch <roeschinc@gmail.com>

committed 5 years ago

0218557c Browse Directory

posix_memalign appears in API 17, not 16 (#3532) · 2d53f84d
hlu1 committed 5 years ago

2d53f84d Browse Directory

10 Jul, 2019 3 commits

init (#3476) · 273c0280
```
lint

update

address comment

comment out breaking test
```
雾雨魔理沙 committed 5 years ago
273c0280 Browse Directory

[Relay][RFC] Implement type checking for Any (#3221) · 3fb84e2b

* Implement type checking for Any

Remove code generation related changes

Remove compile changes

Remove more

Remove unification hack

Add some code back that was needed, and clean up test

Refactor test cases

WIP

Implement TypeHint AST

Add test case which should fail

Remove unification changes, and fix bug with let rec

Restore unification for shapes

Improve error reporting while debugging

All examples type check

All examples type check

WIP

First version that works with hints, needs clean up

Remove dead code

Tweaks

Remove type hint

Remove unecessary type hint stuff

Remove more type hints

Clean up

Expose Any expression node

Address CR

Fix

Fix solver

Kill unecessary code

Fix

PyLint

Fix

Relocate loops

Fix license and test

Lint again

Lint again

Fix loops

Fix docstring

Fix template error

Fix compiler issue

Fix compile err

Remove more runtime changes

Restore buffer

Fix segfault

Fix

Fix arange

* Address feedback

* Fix typo

* Fix arange

* Fix op level3

* Fix issue with Python wrapper

committed 5 years ago

3fb84e2b Browse Directory

[CPP] Refactor remove tvm/tvm.h (#3523) · 025a6c80
Tianqi Chen committed 5 years ago

025a6c80 Browse Directory