- 24 Jul, 2019 1 commit
-
-
* small bug fix for DataTypeObject * retrigger ci
Zhi committed
-
- 23 Jul, 2019 7 commits
-
-
internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Andrew Tulloch committed -
= Motivation It's useful to expose the tvm::reinterpret functionality to Relay/TOPI users, as this allows them to build (fused) operators leveraging the bitwise reinterpretation of an operator. An example is approximate transcendental functions, which can be implemented similar to: ```.py def C(x): return relay.expr.const(x, "float32") def approx_exp(x): x = relay.minimum(relay.maximum(x, C(-88.0)), C(88.0)) x = C(127.0) + x * C(1.44269504) xf = relay.floor(x) i = relay.cast(xf, "int32") x = x - xf Y = C(0.99992522) + x * (C(0.69583354) + x * (C(0.22606716) + x * C(0.078024523))) exponent = relay.left_shift(i, relay.expr.const(23, "int32")) exponent = relay.reinterpret(exponent, "float32") return exponent * Y def approx_sigmoid(x): # <2.0e-5 absolute error over [-5, 5] y = approx_exp(x) return y / (y + C(1.0)) def approx_tanh(x): # <4.0e-5 absolute error over [-5, 5] x = x * C(2.0) y = approx_exp(x) return (y - C(1.0)) / (y + C(1.0)) ``` See unit tests for implementations of these approximate transendentals.
Andrew Tulloch committed -
Luis Vega committed
-
* Update the Relay adding pass doc to reference the new pass infrastructure * Correct pass name Co-Authored-By: Zhi <5145158+zhiics@users.noreply.github.com> * Align header equals signs
Steven S. Lyubomirsky committed -
Animesh Jain committed
-
雾雨魔理沙 committed
-
In cases where we have multiple models or threadpools active, spinning around `sched_yield()` may not be desirable, as it prevents the OS from effectively scheduling other threads. Thus, allow users to conditionally disable this behaviour (via an environment variable `TVM_THREAD_POOL_SPIN_COUNT`, similar to existing environment flags for the thread pool such as `TVM_BIND_THREADS`, etc). This substantially improves tail latencies in some of our multi-tenant workloads in practice. Unit tests have been added - on my laptop, running: ``` TVM_THREAD_POOL_SPIN_COUNT=0 ./build/threading_backend_test; TVM_THREAD_POOL_SPIN_COUNT=1 ./build/threading_backend_test; ./build/threading_backend_test; ``` gives https://gist.github.com/ajtulloch/1805ca6cbaa27f5d442d23f9d0021ce6 (i.e. 97ms -> <1ms after this change)
Andrew Tulloch committed
-
- 22 Jul, 2019 3 commits
-
-
* [RFC] Initial support for Tflite operator SPLIT This patch adds initial support for the tflite operator split. However I am not yet sure how to handle the axis parameter for the split operator and support it in the test infrastructure. Putting this up for an initial review and comment. The split operator in tflite according to https://www.tensorflow.org/lite/guide/ops_compatibility appears to take num_or_size_split as a 0D tensor. I also note that tflite.split is one of the few operators that returns multiple outputs and thus the helper routines in the tests needed some massaging to make this work. @apivarov , could you please review this ? Thanks, Ramana * Fix the axis parameter Add more tests * Address review comments * Try out frozen_gene's suggestion * Handle split of 1 element * int32 is only supported in tflite 1.14, let's check that version here. * Keep this at python3.5 * Add packaging as a python package to be installed
Ramana Radhakrishnan committed -
Tianqi Chen committed
-
* updated runtime to support non-shared memory FPGAs for instruction and micro-op kernels * adding driver-defined memcpy function to handle F1 cases * refactor to include flush/invalidate in memcpy driver function * update tsim driver * bug fixes * cleanup * pre-allocate fpga readable buffers to improve perf * fix * remove instruction stream address rewrite pass for micro op kernels * fix: * white spaces * fix lint * avoid signed/unsigned compilation warning * avoid signed/unsigned compilation warning * fix * fix * addressing comments * whitespace * moving flush/invalidate out of memmove * clearnup * fix * cosmetic * rename API * comment fix
Thierry Moreau committed
-
- 21 Jul, 2019 2 commits
-
-
Tianqi Chen committed
-
Luis Vega committed
-
- 20 Jul, 2019 1 commit
-
-
Luis Vega committed
-
- 19 Jul, 2019 8 commits
-
-
* do * fix test
雾雨魔理沙 committed -
Yong Wu committed
-
* Improve boundary nodes in graph tuner * Limit output node number * Fix test * Improve warning. * Fix test
Yao Wang committed -
Balint Cristian committed
-
Yizhi Liu committed
-
Ramana Radhakrishnan committed
-
Thierry Moreau committed
-
zacario-li committed
-
- 18 Jul, 2019 7 commits
-
-
雾雨魔理沙 committed
-
Tianqi Chen committed
-
Andrew Tulloch committed
-
* Support additional architectures beyond x86_64 in ubuntu_install_java While attempting to get a development environment going for TVM on my AArch64 desktop I ran into some hardcoding of relevant architectures.
Ramana Radhakrishnan committed -
Logan Weber committed
-
Let's welcome Zhi as a new Apache TVM Committer!
Thierry Moreau committed -
Apply suggestions from code review Co-Authored-By: Wei Chen <ipondering.weic@gmail.com>
bulanova-huawei committed
-
- 17 Jul, 2019 6 commits
-
-
* [docs] Add a tutorial for the pass manager * address comment * address more comments * retrigger ci * address steven's comments * address comments * retrigger ci * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Steven S. Lyubomirsky <slyubomirsky@gmail.com> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Logan Weber <36520469+weberlo@users.noreply.github.com> * Update docs/dev/relay_pass_infra.rst Co-Authored-By: Logan Weber <36520469+weberlo@users.noreply.github.com>
Zhi committed -
* [Relay][VM]Fix debug statement * Change debug statement
Wei Chen committed -
Luis Vega committed
-
* Fix build error * comments
Yinghai Lu committed -
Joshua Z. Zhang committed
-
Haichen Shen committed
-
- 16 Jul, 2019 2 commits
-
-
zhengdi committed
-
* tmp * Port vm and object to python * clean up * update vm build module * update * x * tweak * cleanup * update * fix rebase * Rename to VMCompiler * fix
Haichen Shen committed
-
- 15 Jul, 2019 1 commit
-
-
* Enable set_input_zero_copy in GraphRuntime * Fix LoadParams * Fix * lint * Fix remote context issue * Fix * Remove LOG * Remove unused variables * Add tests * works * More test scenarios * make it simpler * Remove unnecessary changes * Address comments * More comments * Address comments * Fix build
Yinghai Lu committed
-
- 14 Jul, 2019 1 commit
-
-
* [TVM] Fix bound inference to avoid allocating too much * [ARITH][BOUND] Pass analyzer to PropBoundToInputs
Sergei Grechanik committed
-
- 13 Jul, 2019 1 commit
-
-
* [ARITH][IR] Introduce FloorDiv/Mod * Address review comments * address review comments, fix div sub rule
Tianqi Chen committed
-