- 05 Sep, 2019 3 commits
-
-
* save * add test * refactor * fix indent * save * refactor
雾雨魔理沙 committed -
* adding support for graphpack over multiply op * increasing resnet model coverage * fix indentation * lint * moving recursion limit fix into graphpack pass * moving recursionlimit to relay init * pooling on NCHWnc format * adding more models * deploy_resnet_on_vta.py * trailing line * generalizing to vision models * merge conflicts * fix, apply quantization to VTA only * improving comments * trimming models that have runtime issues for the moment * lint * lint * lint
Thierry Moreau committed -
Animesh Jain committed
-
- 04 Sep, 2019 1 commit
-
-
Rebasing. Empty commit. Clang-format styling.
Animesh Jain committed
-
- 02 Sep, 2019 1 commit
-
-
Animesh Jain committed
-
- 01 Sep, 2019 2 commits
-
-
* init shape func in interpreter and vm compiler * Update interpreter * fix * lint * lint * fix * remove hack * update * fix * fix * update * address comments & update for shape_of * fix lint * update * fix hybrid * lint * fix bug & add take shape func * lint * lint * update * fix flaky test * add todo
Haichen Shen committed -
* Added arm_cpu NHWC schedules. * Fixed kernel shape legalization. * Added bitserial ops to relay. * Snapshot and more missing files. * Added dense testing. * Added tests * Added ASF header to new files. * cc lint * Pylint change. * pylint fixes. * Change arm legalize test. * Added assert check to arm legalize. * Added better documentation, fixed some bad style * Reverted arm conv2d nhwc changes.
Josh Fromm committed
-
- 31 Aug, 2019 1 commit
-
-
Animesh Jain committed
-
- 30 Aug, 2019 2 commits
-
-
Animesh Jain committed
-
Animesh Jain committed
-
- 29 Aug, 2019 2 commits
-
-
* [Relay] Conv2d grad * Fix test * Fix first order gradient
Wuwei Lin committed -
* [TensorFlow] Fix limitation that depth_mult can only be 1 for DepthwiseConv2dNative * Improve code readability
lixiaoquan committed
-
- 23 Aug, 2019 1 commit
-
-
Animesh Jain committed
-
- 22 Aug, 2019 2 commits
-
-
* Add one-hot to Relay * topi implementation * Working * add topi test * Add TF test * Fix check * fix linting issues * fix documentation * Fix documentation * Add support for on_value, off_value, axis, dtype * Add full support for axis * Fix compute and update test_forward * Move on_value and off_value to inputs * Add topi test * Update tests * Update docs * Fix style * re-enable tests * Add one_hot to mxnet converter
Jon Soifer committed -
Josh Fromm committed
-
- 21 Aug, 2019 1 commit
-
-
* [Relay][VM]VM debugger * Report mean/min/max for op duration * Typos * Lint * Lint * Lint * Support build debug VM in CMake * Lint * Enable VM debug in unit test * Disable debug vm test until new docker image is built * Add device sync code * Fix qnn unit test * Disable vm debug by default * Rename files * Rename classes * Fix comment * Fix comment
Wei Chen committed
-
- 16 Aug, 2019 2 commits
-
-
Wuwei Lin committed
-
* QNN quantize and dequantize operators. * addressing review comments. * addressing review comments. * Adding new line at the end of the file. * Adhering to styling guidelines. * Adding name to contributors. * Fixing lint issue. * Fixing file name. * Removing unnecessary code.
shoubhik committed
-
- 15 Aug, 2019 1 commit
-
-
* Refactor. * update * update * update * update * update * update
ziheng committed
-
- 13 Aug, 2019 2 commits
-
-
Zhi committed
-
* Added relay and topi mirror_pad operator. * Added mirror_padding to tensorflow frontend. * Added mirrorpad testing in tensorflow frontent. * Added space_to_depth in tf frontend. * Added tests for spacetodepth. * spacetodepth bug fix. * Lint fix * Added mirror pad python attrs. * Pad code formatting. * Syntax improvement * Hopefully last lint fix
Josh Fromm committed
-
- 12 Aug, 2019 1 commit
-
-
Neo Chien committed
-
- 11 Aug, 2019 1 commit
-
-
* aot * save * save * fix test * remove vta changes * lint
雾雨魔理沙 committed
-
- 09 Aug, 2019 1 commit
-
-
* reproduce error * fix * lint * lint
雾雨魔理沙 committed
-
- 08 Aug, 2019 1 commit
-
-
* [Relay] [Quantization] WIP - Common files for the qauntization work. * [Relay] [Quantization] WIP - Prototyping requantize op. * Requantize operator implementation. Requantize converts one quantized tensor representation to another quantized representation. The PR has following implementation features - Requantize operator defined in qnn namespace - relay.qnn.requantize - Lowering of the requantize to exisiting Relay operators - Integer fixed point implementation of requantize - Two rounding modes - FE_UPWARDS (round towards infinity) and FE_AWAY_FROM_ZERO (std::round behavior) - Floating point implementation as well, that can act as reference or can be used for devices when FP32 computation is not used. - Unit test cases Relevant Issue - https://github.com/dmlc/tvm/issues/2351 Credit to TFLite and GemmLowp to provide reference implementations. * Typo and lint fixes. * Doc fix. * Uncommenting the lint script (fixing mistake). * Modifying the unit tests. * Moving C++ files into src/relay/qnn * Moving python files to python/tvm/relay/qnn. Some minor fixes. * Moving the attrs.h inside the include directory. * Pushing files that I forgot earlier. Changing util location. * Incorporating comments. API change. Lint fixes. * Modifying the GetFixedPointMultiplierShift API as per comments. * Forgot the dialect change. * Changing rewrite to qnn_lower. * Renaming Quantize to Qnn for clarity. * Remove use_int_domain. * Incorportaing review comments. * Adding API doc for QNN dialect. * Move the qnn_lower pass to transform namespace. * Moving from expr to module. Adding namespace in C++. * Minor sentence rewrites. Added qnn namespace. * Added the API doc. * Chanding default out_dtype to int8. Adding a test with in/out_dtype as uint8. * Style fixes. Better error messages. * Adding documentation. * More documentation fixes. * Adding out dtype check for requantize. * Adding corner case for FP32 to fixed point conversion. * Adding extra line. * Documentation fix. * Adding static inline. * Incorporating jackwish comment. Removed idtype from requantize lowering. * Removing Quantize/Dequantize code. Restricting Requantize to (u)int8/int32. * Style fixes. * Fix the docs. * Move to Legalize API.
Animesh Jain committed
-
- 07 Aug, 2019 1 commit
-
-
* Add LayerNorm op * update * fix * Add mean_std and mean_variance * add std and update doc * add license * x * lint * x * fix * fix doc
Haichen Shen committed
-
- 06 Aug, 2019 3 commits
-
-
* [Relay] Rewrite pass. This pass transforms an expression to other expression. This pass has many usecases * Replace a expr to another expr, if the other expr has faster performance. * For ASICs, we might want to modify the inputs to adapt to the HW support. * Alter op layout can work in conjunction with this pass. The supporting usecase is the Intel i8 x i8 conv. Intel HW supports u8 x i8 conv in HW. Using this pass, we can replace an i8 x i8 conv to a sequence of operators where one of the operators is now u8 x i8 conv. This will also help automatic quantizaion performance. * Better API name. * Removing the conv2d legalization for x86. Will send a separate PR. * Test name changes. * Registering one funtion to register FTVMLegalize. * Better comments.
Animesh Jain committed -
* fix * fix interpreter
Haichen Shen committed -
* add build gcn tutorial * add transpose operator for square sparse matrices * remove extra files * change loop tag * comply with lint * comply with lint -- line too long * comply with lint * lint check * lint check * lint check * apply marisa and theirry's reviews
Yulun Yao committed
-
- 05 Aug, 2019 1 commit
-
-
* save lint some lint lint add charrnn save save save remove debug remove debug remove space refactor save rewrite dce * reset files * join -> meet * lint * address review comment * wordsmith
雾雨魔理沙 committed
-
- 03 Aug, 2019 1 commit
-
-
* Fix gather_nd in Relay * Add test cases for gather_nd.
Huilin Qu committed
-
- 02 Aug, 2019 2 commits
-
-
* fix * lint
雾雨魔理沙 committed -
* [Relay][Quantization] Support floating-point scale * [Relay][Quantization] KL-divergence calibration on dataset * Fix unhandled LeftShift case in QuantizeRealize * Fix lint * drop QBias * fix lint * address comments * address comments * Update comments * address comments * lint * kQIdentity = 0
Wuwei Lin committed
-
- 01 Aug, 2019 2 commits
-
-
The patch adds support for Tensorflow operators log1p and cos Tensorflow log1p is described at https://www.tensorflow.org/api_docs/python/tf/math/log1p Tensorflow cos is described at https://www.tensorflow.org/api_docs/python/tf/math/cos Tensorflow sin is described at https://www.tensorflow.org/api_docs/python/tf/math/sin
alexgl-github committed -
* add fatal lint lint lint do make completeness check an error lint remove fatal * fix test * reset parser file * remove unneeded import * Update python/tvm/relay/adt.py Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com> * Update include/tvm/relay/adt.h Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com> * Eliminate trailing whitespace (my fault)
雾雨魔理沙 committed
-
- 31 Jul, 2019 1 commit
-
-
* relay vm serialization * fix lint * load params, fix stream * lint * fix typo
Zhi committed
-
- 25 Jul, 2019 1 commit
-
-
Lianmin Zheng committed
-
- 24 Jul, 2019 2 commits
- 23 Jul, 2019 1 commit
-
-
internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Andrew Tulloch committed
-