- 22 Sep, 2019 1 commit
-
-
* add expr `isnan` * move to intrinsic * doc & add to topi * fix error from ci
Huang, Guangtai committed
-
- 20 Sep, 2019 1 commit
-
-
MXNet pad is described at: https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.pad Add support for parameter 'None' in MXNet slice operator. MXNet 'slice' is described at https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.slice Add support for MXNet cos, sin, arctan MXNet 'cos' is described at https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.cos MXNet 'sin' is described at https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.sin MXNet arctan is descirbed at https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.arctan Add support for MXNet 1D Convolution and 1D Deconvolution MXNet convolution is described at: https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.Convolution MXNet Deconvolution is described at: https://mxnet.incubator.apache.org/api/python/symbol/symbol.html#mxnet.symbol.Deconvolution
Alex Gladkov committed
-
- 19 Sep, 2019 1 commit
-
-
* add proper scheduling for dense on CUDA * add fallback config and fix unit test * fix corner cases * refactoring * fix bias and add testcase * let fusion happen
Cody Hao Yu committed
-
- 16 Sep, 2019 1 commit
-
-
* [TOPI] operator support: logical_and, logical_or, logical_not * [TOPI] operator support: logical_and, logical_or, logical_not * [TOPI] fix test cases for operator support: logical_and, logical_or, logical_not * [TOPI] fix test cases for operator support: logical_not
Neo Chien committed
-
- 09 Sep, 2019 1 commit
-
-
* add more ops * stop vectorization for erf * x * cleanup * fix * add whitelist for vectorizable intrin * add tf converter * fix dense * fix * add missing intrin * fix mxnet frontend * fix nvptx
Haichen Shen committed
-
- 08 Sep, 2019 1 commit
-
-
雾雨魔理沙 committed
-
- 01 Sep, 2019 1 commit
-
-
* init shape func in interpreter and vm compiler * Update interpreter * fix * lint * lint * fix * remove hack * update * fix * fix * update * address comments & update for shape_of * fix lint * update * fix hybrid * lint * fix bug & add take shape func * lint * lint * update * fix flaky test * add todo
Haichen Shen committed
-
- 22 Aug, 2019 2 commits
-
-
* Add one-hot to Relay * topi implementation * Working * add topi test * Add TF test * Fix check * fix linting issues * fix documentation * Fix documentation * Add support for on_value, off_value, axis, dtype * Add full support for axis * Fix compute and update test_forward * Move on_value and off_value to inputs * Add topi test * Update tests * Update docs * Fix style * re-enable tests * Add one_hot to mxnet converter
Jon Soifer committed -
Josh Fromm committed
-
- 06 Aug, 2019 1 commit
-
-
* add build gcn tutorial * add transpose operator for square sparse matrices * remove extra files * change loop tag * comply with lint * comply with lint -- line too long * comply with lint * lint check * lint check * lint check * apply marisa and theirry's reviews
Yulun Yao committed
-
- 01 Aug, 2019 1 commit
-
-
The patch adds support for Tensorflow operators log1p and cos Tensorflow log1p is described at https://www.tensorflow.org/api_docs/python/tf/math/log1p Tensorflow cos is described at https://www.tensorflow.org/api_docs/python/tf/math/cos Tensorflow sin is described at https://www.tensorflow.org/api_docs/python/tf/math/sin
alexgl-github committed
-
- 31 Jul, 2019 1 commit
-
-
* [TOPI][CUDA] schedule for group_conv2d * Fix #flops
Wuwei Lin committed
-
- 30 Jul, 2019 1 commit
-
-
* Fix traverse_inline not inline zero input op properly * Add where to python and set tag to broadcast * Fix inline * test * fix test target * fix
Wuwei Lin committed
-
- 28 Jul, 2019 1 commit
-
-
Balint Cristian committed
-
- 26 Jul, 2019 1 commit
-
-
* [TOPI][CUDA] Schedule for pool_grad * Relay test * Fix fused op * doc * Remove set scope local
Wuwei Lin committed
-
- 25 Jul, 2019 1 commit
-
-
Balint Cristian committed
-
- 24 Jul, 2019 1 commit
-
-
Wuwei Lin committed
-
- 23 Jul, 2019 3 commits
-
-
internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Andrew Tulloch committed -
= Motivation It's useful to expose the tvm::reinterpret functionality to Relay/TOPI users, as this allows them to build (fused) operators leveraging the bitwise reinterpretation of an operator. An example is approximate transcendental functions, which can be implemented similar to: ```.py def C(x): return relay.expr.const(x, "float32") def approx_exp(x): x = relay.minimum(relay.maximum(x, C(-88.0)), C(88.0)) x = C(127.0) + x * C(1.44269504) xf = relay.floor(x) i = relay.cast(xf, "int32") x = x - xf Y = C(0.99992522) + x * (C(0.69583354) + x * (C(0.22606716) + x * C(0.078024523))) exponent = relay.left_shift(i, relay.expr.const(23, "int32")) exponent = relay.reinterpret(exponent, "float32") return exponent * Y def approx_sigmoid(x): # <2.0e-5 absolute error over [-5, 5] y = approx_exp(x) return y / (y + C(1.0)) def approx_tanh(x): # <4.0e-5 absolute error over [-5, 5] x = x * C(2.0) y = approx_exp(x) return (y - C(1.0)) / (y + C(1.0)) ``` See unit tests for implementations of these approximate transendentals.
Andrew Tulloch committed -
Animesh Jain committed
-
- 19 Jul, 2019 1 commit
-
-
Yong Wu committed
-
- 03 Jul, 2019 1 commit
-
-
* Pre-allocate buffer for x86 roi_align * Fix typo
Yao Wang committed
-
- 28 Jun, 2019 1 commit
-
-
* Add sequence_mask use exactly the same arguments as mxnet fix * fix lint * fix lint * add mxnet conversion + relay * update * update doc * fix pylint * fix doc * address comment * try to address comments * try to enable shape check for valid_length * fix * try to fix * fix bug * try to fix * address comment * address comment
Xingjian Shi committed
-
- 14 Jun, 2019 1 commit
-
-
* fix flaky test * fix flaky quantize pass
Haichen Shen committed
-
- 11 Jun, 2019 1 commit
-
-
hlu1 committed
-
- 10 Jun, 2019 1 commit
-
-
* Support x86 dilation conv2d and improve multi-batch conv2d * Fix lint
Yao Wang committed
-
- 09 Jun, 2019 1 commit
-
-
* Improve non_max_suppression for CPU * Improve get_valid_counts * Minor change * Skip some unnecessary computes
Yao Wang committed
-
- 06 Jun, 2019 2 commits
- 05 Jun, 2019 1 commit
-
-
hlu1 committed
-
- 04 Jun, 2019 1 commit
-
-
* init impl for topk * Fix cpu for topk * init cuda impl for topk * Add cuda for topk * fix * Add doc * update doc * lint * lint * lint * x * fix warning * [Relay] Add TopK in tf converter * Add frontend converter * fix
Haichen Shen committed
-
- 28 May, 2019 1 commit
-
-
masahi committed
-
- 22 May, 2019 1 commit
-
-
* Support the 1x1 int8 conv with NHWC layout and weight packing fix linter * fix the memoize issue * fix the failed nhwc test * add the schedule for pack to unbreak other tests * skip avx512 compile * Support the 1x1 int8 conv with NHWC layout and weight packing fix linter * fix the memoize issue * fix the failed nhwc test * add the schedule for pack to unbreak other tests * skip avx512 compile * Unify the data_layout and kernel_layout relation * add asf header * fix the comment * retrigger the build/test
llyfacebook committed
-
- 20 May, 2019 2 commits
-
-
* [Relay][TOPI] operator All * Update tests/python/frontend/tensorflow/test_forward.py Co-Authored-By: yongwww <55wuyong@163.com> * fix comments * change to level 4
Yong Wu committed -
Haichen Shen committed
-
- 17 May, 2019 1 commit
-
-
hlu1 committed
-
- 09 May, 2019 1 commit
-
-
* Add topi adaptive_pool * Use adaptive_pool to compute global_pool * Add relay adaptive pool2d * Fix lint * Fix typo * Minor change * Change support level to 10 * Add contrib * Remove global pool schedule * Add contrib module * Fix lint * Update doc * Update doc
Yao Wang committed
-
- 08 May, 2019 1 commit
-
-
* deconv tests * deconv bug fixed for certain cases tests added
Leyuan Wang committed
-
- 29 Apr, 2019 1 commit
-
-
* ssd gluoncv gpu op updated * ssd gluoncv gpu op updated * tutorials and testes modified * tutorials and testes modified * fix lint * fix lint * address comment * multibox bug fixed * space line added * use less threads per block * use less threads per block * less threads per block for get valid count * less threads per block for get valid count * merge with master * Revert "less threads per block for get valid count" This reverts commit 08896cfccc34b0b2a1646d01d01ea4cad73941c4. * Revert "less threads per block for get valid count" This reverts commit 08896cfccc34b0b2a1646d01d01ea4cad73941c4. * typo fixed * elem length made to a variable * fix lint error * fix lint error * lint fixed * bug fixed * bug fixed * lint fixed * error fixed * error fixed * test ci * test ci * seperate argsort to be an independent op * seperate argsort to be an independent op * fix lint * fix lint * remove unsupported models * typo fixed * argsort added to realy * solve conflicts with master * fix lint * fix lint * test push * Revert "test push" This reverts commit 6db00883fab6cc06bddf564c926bb27c874397d8. * fix lint error * fix more lint * cpu test_sort udpated * debug ci * nms fixed * expose argsort to relay frontend * test ci * fix lint * sort register error fixed * fix nnvm * nms type fixed * adaptive pooling added to relay * Revert "adaptive pooling added to relay" This reverts commit 1119f1f2c055753e0cc5611627597749134c5c8c. * fix lint * expose argsort op * fix lint * fix lint * fix lint * sort test updated * sort bug fixed * nnvm error fixed * fix argsort default data type returned to be float insteaf of int * fix lint * fix lint * test fixed * fix valid count * fix titanx bug * tutorial add both targets * titanx error fixed * try to fix CI old gpu error * try to solve CI GPU error * get_valid_count added * reverse get_valid_count * get valid count optimized * address comments * fix ci error * remove unessesary block sync * add back one sync * address comments * address more comments * more comments * move sort to be indepent algorithm * typo fixed * more typos * comments addressed * doc updated * fix pylint * address final comments * apache license added
Leyuan Wang committed
-
- 28 Apr, 2019 1 commit
-
-
Wuwei Lin committed
-