- 07 Sep, 2019 1 commit
-
-
* update lint * lint fixed * lint updated * lint fixed * lint fixed * lint fixed * updates * add intel graphics as a package * remove print info * depthwise conv2d schedule added for intel graphics * asdf * fix lint * fix lint * fix ci * add channels
Leyuan Wang committed
-
- 01 Sep, 2019 1 commit
-
-
* Added arm_cpu NHWC schedules. * Fixed kernel shape legalization. * Added bitserial ops to relay. * Snapshot and more missing files. * Added dense testing. * Added tests * Added ASF header to new files. * cc lint * Pylint change. * pylint fixes. * Change arm legalize test. * Added assert check to arm legalize. * Added better documentation, fixed some bad style * Reverted arm conv2d nhwc changes.
Josh Fromm committed
-
- 23 Aug, 2019 1 commit
-
-
Animesh Jain committed
-
- 22 Aug, 2019 2 commits
-
-
* Add one-hot to Relay * topi implementation * Working * add topi test * Add TF test * Fix check * fix linting issues * fix documentation * Fix documentation * Add support for on_value, off_value, axis, dtype * Add full support for axis * Fix compute and update test_forward * Move on_value and off_value to inputs * Add topi test * Update tests * Update docs * Fix style * re-enable tests * Add one_hot to mxnet converter
Jon Soifer committed -
Josh Fromm committed
-
- 21 Aug, 2019 1 commit
-
-
* Support cblas library in dense * start to add support for generic batch_matmul compute * Add x86 override for batch_matmul * Fix linting * reset file * Fix typos * dummy change to re-trigger CI
Jon Soifer committed
-
- 19 Aug, 2019 1 commit
-
-
Wuwei Lin committed
-
- 14 Aug, 2019 1 commit
-
-
Animesh Jain committed
-
- 13 Aug, 2019 1 commit
-
-
* Added relay and topi mirror_pad operator. * Added mirror_padding to tensorflow frontend. * Added mirrorpad testing in tensorflow frontent. * Added space_to_depth in tf frontend. * Added tests for spacetodepth. * spacetodepth bug fix. * Lint fix * Added mirror pad python attrs. * Pad code formatting. * Syntax improvement * Hopefully last lint fix
Josh Fromm committed
-
- 07 Aug, 2019 1 commit
-
-
* Fix mxnet converter for hybrid block * tweak * fix rebase * fix * add test
Haichen Shen committed
-
- 06 Aug, 2019 2 commits
-
-
* Fix the tile_rx and tile_ry issue. Note that this patch depends on pull request #9 in tvm-distro.
mingwayzhang committed -
* add build gcn tutorial * add transpose operator for square sparse matrices * remove extra files * change loop tag * comply with lint * comply with lint -- line too long * comply with lint * lint check * lint check * lint check * apply marisa and theirry's reviews
Yulun Yao committed
-
- 05 Aug, 2019 1 commit
-
-
* Update Softmax compute and CPU schedule * Add C++ compute * Fix schedule * Update CUDA and OpenGL schedules * Fix log_softmax * Fix hls and opengl schedules * Fix CUDA schedule
Jon Soifer committed
-
- 02 Aug, 2019 1 commit
-
-
* [TOPI] Memoize winograd matrix * lint * Fix name
Lianmin Zheng committed
-
- 01 Aug, 2019 2 commits
-
-
sf-wind committed
-
The patch adds support for Tensorflow operators log1p and cos Tensorflow log1p is described at https://www.tensorflow.org/api_docs/python/tf/math/log1p Tensorflow cos is described at https://www.tensorflow.org/api_docs/python/tf/math/cos Tensorflow sin is described at https://www.tensorflow.org/api_docs/python/tf/math/sin
alexgl-github committed
-
- 31 Jul, 2019 1 commit
-
-
* [TOPI][CUDA] schedule for group_conv2d * Fix #flops
Wuwei Lin committed
-
- 30 Jul, 2019 2 commits
-
-
* Fixed topi bdist_wheel build to include libraries. * Removed unneeded imports
Josh Fromm committed -
* Fix traverse_inline not inline zero input op properly * Add where to python and set tag to broadcast * Fix inline * test * fix test target * fix
Wuwei Lin committed
-
- 26 Jul, 2019 1 commit
-
-
* [TOPI][CUDA] Schedule for pool_grad * Relay test * Fix fused op * doc * Remove set scope local
Wuwei Lin committed
-
- 25 Jul, 2019 2 commits
-
-
Balint Cristian committed
-
* uTVM interfaces (#14) * some minor interface changes * implemented HostLowLevelDevice * added MicroDeviceAPI * implemented micro_common and added Python interfaces * current status, semi implemented micro session * added micro_common implementation and python interfaces (#18) * added micro_common implementation and python interfaces (#18) * current status, semi implemented * host test working * updated interfaces for MicroSession arguments allocation * make somewhat lint compatible * fix based on comments * added rounding macro * fix minor bug * improvements based on comments * Clean up `binutil.py` and make Python-3-compatible * Change argument allocation design * Address feedback and lint errors * Improve binutil tests * Simplify allocator (per @tqchen's suggestions) * Doc/style fixes * farts * mcgee * rodata section werks (and so does `test_runtime_micro_workspace.py`) * simple graph runtime werk * TEMP * ResNet works, yo * First round of cleanup * More cleanup * runs a dyson over the code * Another pass * Fix `make lint` issues * ready to pr... probably * final * Undo change * Fix rebase resolution * Minor fixes * Undo changes to C codegen tests * Add `obj_path` in `create_micro_lib` * TEMP * Address feedback * Add missing TODO * Partially address feedback * Fix headers * Switch to enum class for `SectionKind` * Add missing ASF header * Fix lint * Fix lint again * Fix lint * Kill lint warnings * Address feedback * Change Python interface to MicroTVM All interaction with the device is now through `Session` objects, which are used through Python's `with` blocks. * Reorder LowLevelDevice interface * Store shared ptr to session in all alloced objects * Move helper functions out of `tvm.micro` * Switch static char arr to vector * Improve general infra and code quality Does not yet address all of tqchen's feedback * Forgot a rename * Fix lint * Add ASF header * Fix lint * Partially address MarisaKirisame's feedback * Lint * Expose `MicroSession` as a node to Python * Revert to using `Session` constructor * Fix compiler error * (Maybe) fix CI error * Debugging * Remove * Quell lint * Switch to stack-based session contexts * Make uTVM less intrusive to host codegen And use SSA for operands of generated ternary operators * Inline UTVMArgs into UTVMTask struct * Remove `HostLowLevelDevice` header * Remove `BaseAddr` class * Address feedback * Add "utvm" prefix to global vars in runtime * Fix lint * Fix CI * Fix `test_binutil.py` * Fix submodules * Remove ResNet tests * Make `test_binutil.py` work with nose * Fix CI * I swear this actually fixes the binutil tests * lint * lint * Add fcompile-compatible cross-compile func * Add docs for uTVM runtime files * Move pointer patching into `MicroSession` * Fix lint * First attempt at unifying cross-compile APIs * Fix lint * Rename `cross_compile` back to `cc` * Address feedback * Remove commented code * Lint * Figure out failing function * Remove debugging code * Change "micro_dev" target to "micro" * Add checks in tests for whether uTVM is enabled * Add TODO for 32-bit support * Rename more "micro_dev" to "micro" * Undo rename We already have `tvm.micro` as a namespace. Can't have it as a method as well. * Fix failing CI Thanks to @tqchen for finding this bug. Emitting ternary operators for `min` and `max` causes concurrency bugs in CUDA, so we're moving the ternary op emissions from `CodeGenC` to `CodeGenCHost`. * Address feedback * Fix lint
Logan Weber committed
-
- 24 Jul, 2019 4 commits
-
-
Logan Weber committed
-
Tianqi Chen committed
-
Tianqi Chen committed
-
Wuwei Lin committed
-
- 23 Jul, 2019 3 commits
-
-
internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Andrew Tulloch committed -
= Motivation It's useful to expose the tvm::reinterpret functionality to Relay/TOPI users, as this allows them to build (fused) operators leveraging the bitwise reinterpretation of an operator. An example is approximate transcendental functions, which can be implemented similar to: ```.py def C(x): return relay.expr.const(x, "float32") def approx_exp(x): x = relay.minimum(relay.maximum(x, C(-88.0)), C(88.0)) x = C(127.0) + x * C(1.44269504) xf = relay.floor(x) i = relay.cast(xf, "int32") x = x - xf Y = C(0.99992522) + x * (C(0.69583354) + x * (C(0.22606716) + x * C(0.078024523))) exponent = relay.left_shift(i, relay.expr.const(23, "int32")) exponent = relay.reinterpret(exponent, "float32") return exponent * Y def approx_sigmoid(x): # <2.0e-5 absolute error over [-5, 5] y = approx_exp(x) return y / (y + C(1.0)) def approx_tanh(x): # <4.0e-5 absolute error over [-5, 5] x = x * C(2.0) y = approx_exp(x) return (y - C(1.0)) / (y + C(1.0)) ``` See unit tests for implementations of these approximate transendentals.
Andrew Tulloch committed -
Animesh Jain committed
-
- 19 Jul, 2019 1 commit
-
-
Yong Wu committed
-
- 07 Jul, 2019 2 commits
-
-
Tianqi Chen committed
-
* initialize cond 2d transpose scheduling on x86 * refine the scheduler a bit * fix for lint * address review comments; remove duplicate code * fix lint
Yida Wang committed
-
- 03 Jul, 2019 1 commit
-
-
* Pre-allocate buffer for x86 roi_align * Fix typo
Yao Wang committed
-
- 28 Jun, 2019 2 commits
-
-
Thierry Moreau committed
-
* Add sequence_mask use exactly the same arguments as mxnet fix * fix lint * fix lint * add mxnet conversion + relay * update * update doc * fix pylint * fix doc * address comment * try to address comments * try to enable shape check for valid_length * fix * try to fix * fix bug * try to fix * address comment * address comment
Xingjian Shi committed
-
- 12 Jun, 2019 1 commit
-
-
Leyuan Wang committed
-
- 11 Jun, 2019 1 commit
-
-
hlu1 committed
-
- 10 Jun, 2019 1 commit
-
-
* Support x86 dilation conv2d and improve multi-batch conv2d * Fix lint
Yao Wang committed
-
- 09 Jun, 2019 1 commit
-
-
* Improve non_max_suppression for CPU * Improve get_valid_counts * Minor change * Skip some unnecessary computes
Yao Wang committed
-
- 07 Jun, 2019 1 commit
-
-
Alexander Pivovarov committed
-