Commits · dedcf82fa71e08f4c5b7c995ecc8dc52469c31cf · wenyuanbo / tic

25 Jul, 2019 3 commits
- [Relay][Keras] Permute, Softmax support (#3618) · dedcf82f
  Yong Wu committed Jul 24, 2019
  
  dedcf82f Browse Files
- fix typo (#3611) · e7fb2d4d
  Jian Weng committed Jul 24, 2019
  
  e7fb2d4d Browse Files
- [TOPI] Average Pool2D Bug. (#3607) · 3aa2eaed
```
* [TOPI] Average Pool2D Bug.

Issue - https://github.com/dmlc/tvm/issues/3581

* Add uint16 test.
```
  Animesh Jain committed Jul 24, 2019
  3aa2eaed Browse Files
24 Jul, 2019 6 commits
- Remove prints in `generic_op_impl.py` (#3616) · e0df6e12
  Logan Weber committed Jul 24, 2019
  
  e0df6e12 Browse Files
- Hotfix pylint (#3615) · 023fc6b4
  Tianqi Chen committed Jul 24, 2019
  
  023fc6b4 Browse Files
- [TEST] Fix testcase to make them more compatible to zero-rank (#3612) · 90eee087
  Tianqi Chen committed Jul 24, 2019
  
  90eee087 Browse Files
- init (#3571) · 814554e0
```
quickfix
```
  雾雨魔理沙 committed Jul 24, 2019
  814554e0 Browse Files
- [TOPI][Relay] max_pool2d & avg_pool2d gradient (#3601) · 5c410037
  Wuwei Lin committed Jul 24, 2019
  
  5c410037 Browse Files
- [Relay][vm] Small bug fix for DataTypeObject (#3604) · 440df0aa
```
* small bug fix for DataTypeObject

* retrigger ci
```
  Zhi committed Jul 24, 2019
  440df0aa Browse Files
23 Jul, 2019 7 commits

We observe multiple groups across a range of domains (ASR, NMT, LM, etc), (#3566) · d6dcd6c5

internally and externally, interested in replacing standard dense layers with
block-sparse matrix multiplication layers. The motivations are generally: higher
performance (due to reduction in FLOPs, memory bandwidth/cache footprint),
enabling larger models (e.g. fitting more layers in a given memory budget).

Some public work along these lines:

* https://openai.com/blog/block-sparse-gpu-kernels/
* https://openai.com/blog/sparse-transformer/
* https://arxiv.org/abs/1802.08435
* https://arxiv.org/abs/1711.02782

Various groups have been able to successfully train models with reasonable
levels of sparsity (90%+) with marginal accuracy changes, which suggests
substantial speedups are possible (as this implies a >10x reduction in FLOPs).

It is fairly straightforward to realize these theoretical speedups, see e.g. TVM
benchmarks for Intel CPUs in
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA
results in https://github.com/openai/blocksparse, etc.

* https://github.com/openai/blocksparse (CUDA)
* https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM)
* https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation)

This is extracted from an internal patch we've been using internally. There are
various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but
this is a reasonable starting point. This needs more thorough unit test coverage
however.

We follow the conventions established by scipy.sparse.bsr_matrix and other
libraries, see the unit tests for details.

For folks interested in experimenting with scheduling/AutoTVM etc,
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful
starting point.

committed Jul 23, 2019

d6dcd6c5 Browse Files

{relay,topi}.reinterpret support (#3599) · 2ed31b24

= Motivation

It's useful to expose the tvm::reinterpret functionality to Relay/TOPI users, as
this allows them to build (fused) operators leveraging the bitwise
reinterpretation of an operator. An example is approximate transcendental
functions, which can be implemented similar to:

```.py
    def C(x):
        return relay.expr.const(x, "float32")

    def approx_exp(x):
        x = relay.minimum(relay.maximum(x, C(-88.0)), C(88.0))
        x = C(127.0) + x * C(1.44269504)
        xf = relay.floor(x)
        i = relay.cast(xf, "int32")
        x = x - xf
        Y = C(0.99992522) + x * (C(0.69583354) + x * (C(0.22606716) + x * C(0.078024523)))
        exponent = relay.left_shift(i, relay.expr.const(23, "int32"))
        exponent = relay.reinterpret(exponent, "float32")
        return exponent * Y

    def approx_sigmoid(x):
        # <2.0e-5 absolute error over [-5, 5]
        y = approx_exp(x)
        return y / (y + C(1.0))

    def approx_tanh(x):
        # <4.0e-5 absolute error over [-5, 5]
        x = x * C(2.0)
        y = approx_exp(x)
        return (y - C(1.0)) / (y + C(1.0))
```

See unit tests for implementations of these approximate transendentals.

committed Jul 23, 2019

2ed31b24 Browse Files

remove tabs (#3603) · 66f3bf83
Luis Vega committed Jul 23, 2019

66f3bf83 Browse Files

[Relay][Pass][Docs] Update the doc for adding a Relay pass to mention the pass infra (#3583) · 9911044b

* Update the Relay adding pass doc to reference the new pass infrastructure

* Correct pass name

Co-Authored-By: Zhi <5145158+zhiics@users.noreply.github.com>

* Align header equals signs

committed Jul 23, 2019

9911044b Browse Files

Checking the correct dtypes for choosing the Intel int8 instructions. (#3516) · 3ada7c0e
Animesh Jain committed Jul 23, 2019

3ada7c0e Browse Files
[Relay] [Training] Allow gradient to return a tuple (#3600) · 9e6a8c0d
雾雨魔理沙 committed Jul 23, 2019

9e6a8c0d Browse Files

[Runtime] [ThreadPool] Make SpscTaskQueue::Pop(..) spin_count configurable (#3577) · 9b1c2e08

In cases where we have multiple models or threadpools active, spinning around
`sched_yield()` may not be desirable, as it prevents the OS from effectively
scheduling other threads.

Thus, allow users to conditionally disable this behaviour (via an environment
variable `TVM_THREAD_POOL_SPIN_COUNT`, similar to existing environment flags for
the thread pool such as `TVM_BIND_THREADS`, etc).

This substantially improves tail latencies in some of our multi-tenant
workloads in practice.

Unit tests have been added - on my laptop, running:

```
TVM_THREAD_POOL_SPIN_COUNT=0 ./build/threading_backend_test;
TVM_THREAD_POOL_SPIN_COUNT=1 ./build/threading_backend_test;
./build/threading_backend_test;
```

gives https://gist.github.com/ajtulloch/1805ca6cbaa27f5d442d23f9d0021ce6 (i.e.
97ms -> <1ms after this change)

committed Jul 22, 2019

9b1c2e08 Browse Files

22 Jul, 2019 3 commits

Add support for Tflite operator SPLIT (#3520) · 19eb829e

* [RFC] Initial support for Tflite operator SPLIT

This patch adds initial support for the tflite operator split. However
I am not yet sure how to handle the axis parameter for the split
operator and support it in the test infrastructure. Putting this up for
an initial review and comment.

The split operator in tflite according to
https://www.tensorflow.org/lite/guide/ops_compatibility

appears to take num_or_size_split as a 0D tensor.

I also note that tflite.split is one of the few operators that returns
multiple outputs and thus the helper routines in the tests needed some
massaging to make this work.

@apivarov , could you please review this ?

Thanks,
Ramana

* Fix the axis parameter

Add more tests

* Address review comments

* Try out frozen_gene's suggestion

* Handle split of 1 element

* int32 is only supported in tflite 1.14, let's check that version here.

* Keep this at python3.5

* Add packaging as a python package to be installed

committed Jul 22, 2019

19eb829e Browse Files

Update Jenkinsfile · 443b5b46
Tianqi Chen committed Jul 22, 2019

443b5b46 Browse Files

[VTA] Runtime refactor to allow for non-shared memory FPGAs (e.g. F1) (#3554) · 9d64d321

* updated runtime to support non-shared memory FPGAs for instruction and micro-op kernels

* adding driver-defined memcpy function to handle F1 cases

* refactor to include flush/invalidate in memcpy driver function

* update tsim driver

* bug fixes

* cleanup

* pre-allocate fpga readable buffers to improve perf

* fix

* remove instruction stream address rewrite pass for micro op kernels

* fix:

* white spaces

* fix lint

* avoid signed/unsigned compilation warning

* avoid signed/unsigned compilation warning

* fix

* fix

* addressing comments

* whitespace

* moving flush/invalidate out of memmove

* clearnup

* fix

* cosmetic

* rename API

* comment fix

committed Jul 22, 2019

9d64d321 Browse Files

21 Jul, 2019 2 commits
- [CI] Upgrade LLVM envs (#3590) · 4d314833
  Tianqi Chen committed Jul 21, 2019
  
  4d314833 Browse Files
- add coherent, length, and user bits option to Shell Config (#3593) · 5b5ae980
  Luis Vega committed Jul 21, 2019
  
  5b5ae980 Browse Files
20 Jul, 2019 1 commit
- bugfix function args order in alu instruction generation (#3592) · 3116eeec
  Luis Vega committed Jul 19, 2019
  
  3116eeec Browse Files
19 Jul, 2019 8 commits
- [Relay] add some check for the ad algorithm (#3585) · 1a00cab9
```
* do

* fix test
```
  雾雨魔理沙 committed Jul 20, 2019
  1a00cab9 Browse Files
- [TOPI][RELAY] Add op Size (#3094) · 313bc9de
  Yong Wu committed Jul 19, 2019
  
  313bc9de Browse Files
- [AutoTVM]Improve graph tuner for multiple subgraphs (#3490) · be260836
```
* Improve boundary nodes in graph tuner

* Limit output node number

* Fix test

* Improve warning.

* Fix test
```
  Yao Wang committed Jul 19, 2019
  be260836 Browse Files
- [RPC] Better handle tempdir if subprocess killed. (#3574) · 6b5fbdad
  Balint Cristian committed Jul 19, 2019
  
  6b5fbdad Browse Files
- Add printer for Layout/BijectiveLayout (#3582) · b0481c82
  Yizhi Liu committed Jul 19, 2019
  
  b0481c82 Browse Files
- Mention minimum version of python features one should stick to (#3588) · bfafa908
  Ramana Radhakrishnan committed Jul 19, 2019
  
  bfafa908 Browse Files
- avoiding cast None to int errors (#3578) · 4c9729bf
  Thierry Moreau committed Jul 18, 2019
  
  4c9729bf Browse Files
- fix topi c++ conv2d_nchw lambda expr issue (#3570) · 3d7da362
  zacario-li committed Jul 18, 2019
  
  3d7da362 Browse Files
18 Jul, 2019 7 commits
- [Relay] parser/pretty printer roundtripping (#3536) · 2973f8a6
  雾雨魔理沙 committed Jul 18, 2019
  
  2973f8a6 Browse Files
- [ARITH] Simplify let (#3568) · e5efc632
  Tianqi Chen committed Jul 18, 2019
  
  e5efc632 Browse Files
- Emit DWARF debug information (#3420) · d82db909
  Andrew Tulloch committed Jul 18, 2019
  
  d82db909 Browse Files
- Support additional architectures beyond x86_64 in ubuntu_install_java (#3546) · e686da79
```
* Support additional architectures beyond x86_64 in ubuntu_install_java

While attempting to get a development environment going for TVM
on my AArch64 desktop I ran into some hardcoding of relevant architectures.
```
  Ramana Radhakrishnan committed Jul 18, 2019
  e686da79 Browse Files
- Disable MicroTVM on i386 CI (#3569) · 6b669bfa
  Logan Weber committed Jul 18, 2019
  
  6b669bfa Browse Files
- [Community] Zhi Chen -> Committer (#3572) · 637397be
```
Let's welcome Zhi as a new Apache TVM Committer!
```
  Thierry Moreau committed Jul 18, 2019
  637397be Browse Files
- tightening bounding box for IntSet fused in PassUpDomain (#3073) · 54f903a5
```
Apply suggestions from code review

Co-Authored-By: Wei Chen <ipondering.weic@gmail.com>
```
  bulanova-huawei committed Jul 17, 2019
  54f903a5 Browse Files
17 Jul, 2019 3 commits

[docs] Add a tutorial for the pass manager (#3515) · ce363d61

* [docs] Add a tutorial for the pass manager

* address comment

* address more comments

* retrigger ci

* address steven's comments

* address comments

* retrigger ci

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com>

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Logan Weber <36520469+weberlo@users.noreply.github.com>

* Update docs/dev/relay_pass_infra.rst

Co-Authored-By: Logan Weber <36520469+weberlo@users.noreply.github.com>

committed Jul 17, 2019

ce363d61 Browse Files

[Relay][VM]Fix debug statement (#3565) · ca432bd5
```
* [Relay][VM]Fix debug statement

* Change debug statement
```
Wei Chen committed Jul 17, 2019
ca432bd5 Browse Files
fix pynq 32-bit address pointers (#3558) · 5c864ac8
Luis Vega committed Jul 16, 2019

5c864ac8 Browse Files