Commits · 6a377f77e8dac912ff220efeb26c28fa7e5a3f8f · wenyuanbo / tic

07 Sep, 2019 1 commit

[TOPI] Intel graphics conv2d autotvm template added (#3839) · 70042b78

* update lint

* lint fixed

* lint updated

* lint fixed

* lint fixed

* lint fixed

* updates

* add intel graphics as a package

* remove print info

* depthwise conv2d schedule added for intel graphics

* asdf

* fix lint

* fix lint

* fix ci

* add channels

committed 5 years ago

70042b78 Browse Directory

01 Sep, 2019 1 commit

[Relay] Bitserial ops (#3844) · d08c74ca

* Added arm_cpu NHWC schedules.

* Fixed kernel shape legalization.

* Added bitserial ops to relay.

* Snapshot and more missing files.

* Added dense testing.

* Added tests

* Added ASF header to new files.

* cc lint

* Pylint change.

* pylint fixes.

* Change arm legalize test.

* Added assert check to arm legalize.

* Added better documentation, fixed some bad style

* Reverted arm conv2d nhwc changes.

committed 5 years ago

d08c74ca Browse Directory

23 Aug, 2019 1 commit
- [Legalize][QNN] Pass out_types to Legalize. Update QNN requantize to read from out_types. (#3782) · 1e4aea81
  Animesh Jain committed 5 years ago
  
  1e4aea81 Browse Directory
22 Aug, 2019 2 commits

[TOPI][Relay][TensorFlow] Add OneHot operator (#3781) · 554df211

* Add one-hot to Relay

* topi implementation

* Working

* add topi test

* Add TF test

* Fix check

* fix linting issues

* fix documentation

* Fix documentation

* Add support for on_value, off_value, axis, dtype

* Add full support for axis

* Fix compute and update test_forward

* Move on_value and off_value to inputs

* Add topi test

* Update tests

* Update docs

* Fix style

* re-enable tests

* Add one_hot to mxnet converter

committed 5 years ago

554df211 Browse Directory

Changed topi cc resize to python implementation with new features. (#3788) · 7264cb6a
Josh Fromm committed 5 years ago

7264cb6a Browse Directory

21 Aug, 2019 1 commit

[TOPI] Use cblas for dense and batch_matmul when "cblas" is in the target libraries (#3787) · c870261f

* Support cblas library in dense

* start to add support for generic batch_matmul compute

* Add x86 override for batch_matmul

* Fix linting

* reset file

* Fix typos

* dummy change to re-trigger CI

committed 5 years ago

c870261f Browse Directory

19 Aug, 2019 1 commit
- [TOPI, CUDA] Improve conv2d_transpose schedule template (#3796) · ebf52dfb
  Wuwei Lin committed 5 years ago
  
  ebf52dfb Browse Directory
14 Aug, 2019 1 commit
- [Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. (#3754) · 5498e54d
  Animesh Jain committed 5 years ago
  
  5498e54d Browse Directory
13 Aug, 2019 1 commit

[Relay] SpaceToDepth and MirrorPad Operators (#3718) · 8bd9d4d5

* Added relay and topi mirror_pad operator.

* Added mirror_padding to tensorflow frontend.

* Added mirrorpad testing in tensorflow frontent.

* Added space_to_depth in tf frontend.

* Added tests for spacetodepth.

* spacetodepth bug fix.

* Lint fix

* Added mirror pad python attrs.

* Pad code formatting.

* Syntax improvement

* Hopefully last lint fix

committed 5 years ago

8bd9d4d5 Browse Directory

07 Aug, 2019 1 commit
- [Frontend][MXNet] Fix mxnet converter for hybridblock and add div_sqrt_dim (#3701) · f9e8c116
```
* Fix mxnet converter for hybrid block

* tweak

* fix rebase

* fix

* add test
```
  Haichen Shen committed 5 years ago
  f9e8c116 Browse Directory
06 Aug, 2019 2 commits

Fix (2/2) [TOPI] conv2d schedule code (#3648) (#3717) · 831b32e7
```
* Fix the tile_rx and tile_ry issue.

    Note that this patch depends on pull request #9 in tvm-distro.
```
mingwayzhang committed 5 years ago
831b32e7 Browse Directory

[Relay] [TOPI] `{relay,topi}.nn.sparse_transpose` for **Square** CSR matrices (#3707) · 3b287c4d

* add build gcn tutorial

* add transpose operator for square sparse matrices

* remove extra files

* change loop tag

* comply with lint

* comply with lint -- line too long

* comply with lint

* lint check

* lint check

* lint check

* apply marisa and theirry's reviews

committed 5 years ago

3b287c4d Browse Directory

05 Aug, 2019 1 commit

[TOPI] Update softmax compute and CPU schedule (#3680) · ee74d00e

* Update Softmax compute and CPU schedule

* Add C++ compute

* Fix schedule

* Update CUDA and OpenGL schedules

* Fix log_softmax

* Fix hls and opengl schedules

* Fix CUDA schedule

committed 5 years ago

ee74d00e Browse Directory

02 Aug, 2019 1 commit
- [TOPI] Memoize winograd matrix (#3687) · 7de8a3a4
```
* [TOPI] Memoize winograd matrix

* lint

* Fix name
```
  Lianmin Zheng committed 5 years ago
  7de8a3a4 Browse Directory
01 Aug, 2019 2 commits

Enable the sparse schedule (#3651) · 9ae01e0b
sf-wind committed 5 years ago

9ae01e0b Browse Directory

Add support for Tensorflow operators log1p, cos, sin (#3614) · d72cdfa6

The patch adds support for Tensorflow operators log1p and cos
Tensorflow log1p is described at https://www.tensorflow.org/api_docs/python/tf/math/log1p
Tensorflow cos is described at https://www.tensorflow.org/api_docs/python/tf/math/cos
Tensorflow sin is described at https://www.tensorflow.org/api_docs/python/tf/math/sin

committed 5 years ago

d72cdfa6 Browse Directory

31 Jul, 2019 1 commit
- [TOPI][CUDA] schedule for group_conv2d (#3663) · 11da1ca3
```
* [TOPI][CUDA] schedule for group_conv2d

* Fix #flops
```
  Wuwei Lin committed 5 years ago
  11da1ca3 Browse Directory
30 Jul, 2019 2 commits
- [TOPI] Enable standalone wheel build (#3657) · 4ce93200
```
* Fixed topi bdist_wheel build to include libraries.

* Removed unneeded imports
```
  Josh Fromm committed 5 years ago
  4ce93200 Browse Directory
- [TOPI] Fix traverse function not inline zero-input op (#3623) · 9d583cf5
```
* Fix traverse_inline not inline zero input op properly

* Add where to python and set tag to broadcast

* Fix inline

* test

* fix test target

* fix
```
  Wuwei Lin committed 5 years ago
  9d583cf5 Browse Directory
26 Jul, 2019 1 commit
- [TOPI][CUDA] Schedule for pool_grad (#3622) · f1ede9a9
```
* [TOPI][CUDA] Schedule for pool_grad

* Relay test

* Fix fused op

* doc

* Remove set scope local
```
  Wuwei Lin committed 5 years ago
  f1ede9a9 Browse Directory
25 Jul, 2019 2 commits

Add Winograd matrices computation. (#3553) · 97e333ca
Balint Cristian committed 5 years ago

97e333ca Browse Directory

Implementation of uTVM (#3227) · ef909df1

* uTVM interfaces (#14)

* some minor interface changes

* implemented HostLowLevelDevice

* added MicroDeviceAPI

* implemented micro_common and added Python interfaces

* current status, semi implemented micro session

* added micro_common implementation and python interfaces (#18)

* added micro_common implementation and python interfaces (#18)

* current status, semi implemented

* host test working

* updated interfaces for MicroSession arguments allocation

* make somewhat lint compatible

* fix based on comments

* added rounding macro

* fix minor bug

* improvements based on comments

* Clean up `binutil.py` and make Python-3-compatible

* Change argument allocation design

* Address feedback and lint errors

* Improve binutil tests

* Simplify allocator (per @tqchen's suggestions)

* Doc/style fixes

* farts

* mcgee

* rodata section werks

(and so does `test_runtime_micro_workspace.py`)

* simple graph runtime werk

* TEMP

* ResNet works, yo

* First round of cleanup

* More cleanup

* runs a dyson over the code

* Another pass

* Fix `make lint` issues

* ready to pr... probably

* final

* Undo change

* Fix rebase resolution

* Minor fixes

* Undo changes to C codegen tests

* Add `obj_path` in `create_micro_lib`

* TEMP

* Address feedback

* Add missing TODO

* Partially address feedback

* Fix headers

* Switch to enum class for `SectionKind`

* Add missing ASF header

* Fix lint

* Fix lint again

* Fix lint

* Kill lint warnings

* Address feedback

* Change Python interface to MicroTVM

All interaction with the device is now through `Session` objects, which
are used through Python's `with` blocks.

* Reorder LowLevelDevice interface

* Store shared ptr to session in all alloced objects

* Move helper functions out of `tvm.micro`

* Switch static char arr to vector

* Improve general infra and code quality

Does not yet address all of tqchen's feedback

* Forgot a rename

* Fix lint

* Add ASF header

* Fix lint

* Partially address MarisaKirisame's feedback

* Lint

* Expose `MicroSession` as a node to Python

* Revert to using `Session` constructor

* Fix compiler error

* (Maybe) fix CI error

* Debugging

* Remove

* Quell lint

* Switch to stack-based session contexts

* Make uTVM less intrusive to host codegen

And use SSA for operands of generated ternary operators

* Inline UTVMArgs into UTVMTask struct

* Remove `HostLowLevelDevice` header

* Remove `BaseAddr` class

* Address feedback

* Add "utvm" prefix to global vars in runtime

* Fix lint

* Fix CI

* Fix `test_binutil.py`

* Fix submodules

* Remove ResNet tests

* Make `test_binutil.py` work with nose

* Fix CI

* I swear this actually fixes the binutil tests

* lint

* lint

* Add fcompile-compatible cross-compile func

* Add docs for uTVM runtime files

* Move pointer patching into `MicroSession`

* Fix lint

* First attempt at unifying cross-compile APIs

* Fix lint

* Rename `cross_compile` back to `cc`

* Address feedback

* Remove commented code

* Lint

* Figure out failing function

* Remove debugging code

* Change "micro_dev" target to "micro"

* Add checks in tests for whether uTVM is enabled

* Add TODO for 32-bit support

* Rename more "micro_dev" to "micro"

* Undo rename

We already have `tvm.micro` as a namespace.  Can't have it as a method
as well.

* Fix failing CI

Thanks to @tqchen for finding this bug.  Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.

* Address feedback

* Fix lint

committed 5 years ago

ef909df1 Browse Directory

24 Jul, 2019 4 commits
- Remove prints in `generic_op_impl.py` (#3616) · e0df6e12
  Logan Weber committed 5 years ago
  
  e0df6e12 Browse Directory
- Hotfix pylint (#3615) · 023fc6b4
  Tianqi Chen committed 5 years ago
  
  023fc6b4 Browse Directory
- [TEST] Fix testcase to make them more compatible to zero-rank (#3612) · 90eee087
  Tianqi Chen committed 5 years ago
  
  90eee087 Browse Directory
- [TOPI][Relay] max_pool2d & avg_pool2d gradient (#3601) · 5c410037
  Wuwei Lin committed 5 years ago
  
  5c410037 Browse Directory
23 Jul, 2019 3 commits

We observe multiple groups across a range of domains (ASR, NMT, LM, etc), (#3566) · d6dcd6c5

internally and externally, interested in replacing standard dense layers with
block-sparse matrix multiplication layers. The motivations are generally: higher
performance (due to reduction in FLOPs, memory bandwidth/cache footprint),
enabling larger models (e.g. fitting more layers in a given memory budget).

Some public work along these lines:

* https://openai.com/blog/block-sparse-gpu-kernels/
* https://openai.com/blog/sparse-transformer/
* https://arxiv.org/abs/1802.08435
* https://arxiv.org/abs/1711.02782

Various groups have been able to successfully train models with reasonable
levels of sparsity (90%+) with marginal accuracy changes, which suggests
substantial speedups are possible (as this implies a >10x reduction in FLOPs).

It is fairly straightforward to realize these theoretical speedups, see e.g. TVM
benchmarks for Intel CPUs in
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA
results in https://github.com/openai/blocksparse, etc.

* https://github.com/openai/blocksparse (CUDA)
* https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM)
* https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation)

This is extracted from an internal patch we've been using internally. There are
various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but
this is a reasonable starting point. This needs more thorough unit test coverage
however.

We follow the conventions established by scipy.sparse.bsr_matrix and other
libraries, see the unit tests for details.

For folks interested in experimenting with scheduling/AutoTVM etc,
https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful
starting point.

committed 5 years ago

d6dcd6c5 Browse Directory

{relay,topi}.reinterpret support (#3599) · 2ed31b24

= Motivation

It's useful to expose the tvm::reinterpret functionality to Relay/TOPI users, as
this allows them to build (fused) operators leveraging the bitwise
reinterpretation of an operator. An example is approximate transcendental
functions, which can be implemented similar to:

```.py
    def C(x):
        return relay.expr.const(x, "float32")

    def approx_exp(x):
        x = relay.minimum(relay.maximum(x, C(-88.0)), C(88.0))
        x = C(127.0) + x * C(1.44269504)
        xf = relay.floor(x)
        i = relay.cast(xf, "int32")
        x = x - xf
        Y = C(0.99992522) + x * (C(0.69583354) + x * (C(0.22606716) + x * C(0.078024523)))
        exponent = relay.left_shift(i, relay.expr.const(23, "int32"))
        exponent = relay.reinterpret(exponent, "float32")
        return exponent * Y

    def approx_sigmoid(x):
        # <2.0e-5 absolute error over [-5, 5]
        y = approx_exp(x)
        return y / (y + C(1.0))

    def approx_tanh(x):
        # <4.0e-5 absolute error over [-5, 5]
        x = x * C(2.0)
        y = approx_exp(x)
        return (y - C(1.0)) / (y + C(1.0))
```

See unit tests for implementations of these approximate transendentals.

committed 5 years ago

2ed31b24 Browse Directory

Checking the correct dtypes for choosing the Intel int8 instructions. (#3516) · 3ada7c0e
Animesh Jain committed 5 years ago

3ada7c0e Browse Directory

19 Jul, 2019 1 commit
- [TOPI][RELAY] Add op Size (#3094) · 313bc9de
  Yong Wu committed 5 years ago
  
  313bc9de Browse Directory
07 Jul, 2019 2 commits
- [ARITH] More recursive rewrite rule, cleanup simplify tests (#3502) · 2a7aebe5
  Tianqi Chen committed 5 years ago
  
  2a7aebe5 Browse Directory
- [TOPI] add basic scheduling for conv2d_transpose on x86 (#3491) · f9788871
```
* initialize cond 2d transpose scheduling on x86

* refine the scheduler a bit

* fix for lint

* address review comments; remove duplicate code

* fix lint
```
  Yida Wang committed 5 years ago
  f9788871 Browse Directory
03 Jul, 2019 1 commit
- Pre-allocate buffer for x86 roi_align (#3475) · 287078c3
```
* Pre-allocate buffer for x86 roi_align

* Fix typo
```
  Yao Wang committed 5 years ago
  287078c3 Browse Directory
28 Jun, 2019 2 commits

[VTA][Relay] Relay Compilation + AutoTVM compatible operator libraries for VTA (#3135) · 3818b2a2
Thierry Moreau committed 5 years ago

3818b2a2 Browse Directory

[RELAY] [OP] [MXNet Frontend] Add sequence_mask (#3437) · 8ef22176

* Add sequence_mask

use exactly the same arguments as mxnet

fix

* fix lint

* fix lint

* add mxnet conversion + relay

* update

* update doc

* fix pylint

* fix doc

* address comment

* try to address comments

* try to enable shape check for valid_length

* fix

* try to fix

* fix bug

* try to fix

* address comment

* address comment

committed 5 years ago

8ef22176 Browse Directory

12 Jun, 2019 1 commit
- Non_maximum_suppression and get_valid_counts add new parameters (#3335) · da1ea262
  Leyuan Wang committed 5 years ago
  
  da1ea262 Browse Directory
11 Jun, 2019 1 commit
- [Topi] Fast mode in take op (#3325) · 2c41fd2f
  hlu1 committed 5 years ago
  
  2c41fd2f Browse Directory
10 Jun, 2019 1 commit
- Support x86 dilation conv2d and improve multi-batch conv2d (#3308) · d43aab07
```
* Support x86 dilation conv2d and improve multi-batch conv2d

* Fix lint
```
  Yao Wang committed 5 years ago
  d43aab07 Browse Directory
09 Jun, 2019 1 commit
- Improve non_max_suppression and get_valid_counts for CPU (#3305) · 98a91af9
```
* Improve non_max_suppression for CPU

* Improve get_valid_counts

* Minor change

* Skip some unnecessary computes
```
  Yao Wang committed 5 years ago
  98a91af9 Browse Directory
07 Jun, 2019 1 commit
- Fix some typos in api docs (#3309) · f33b9eae
  Alexander Pivovarov committed 5 years ago
  
  f33b9eae Browse Directory