internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
common.py | Loading commit data... | |
test_topi_basic.py | Loading commit data... | |
test_topi_batch_matmul.py | Loading commit data... | |
test_topi_bitserial_conv2d.py | Loading commit data... | |
test_topi_bitserial_conv2d_rasp.py | Loading commit data... | |
test_topi_bitserial_dense.py | Loading commit data... | |
test_topi_bnn.py | Loading commit data... | |
test_topi_broadcast.py | Loading commit data... | |
test_topi_clip.py | Loading commit data... | |
test_topi_conv2d_NCHWc.py | Loading commit data... | |
test_topi_conv2d_hwcn.py | Loading commit data... | |
test_topi_conv2d_int8.py | Loading commit data... | |
test_topi_conv2d_nchw.py | Loading commit data... | |
test_topi_conv2d_nhwc.py | Loading commit data... | |
test_topi_conv2d_nhwc_pack_int8.py | Loading commit data... | |
test_topi_conv2d_transpose_nchw.py | Loading commit data... | |
test_topi_conv2d_winograd.py | Loading commit data... | |
test_topi_deformable_conv2d.py | Loading commit data... | |
test_topi_dense.py | Loading commit data... | |
test_topi_depthwise_conv2d.py | Loading commit data... | |
test_topi_depthwise_conv2d_back_input.py | Loading commit data... | |
test_topi_depthwise_conv2d_back_weight.py | Loading commit data... | |
test_topi_dilate.py | Loading commit data... | |
test_topi_group_conv2d.py | Loading commit data... | |
test_topi_group_conv2d_NCHWc_int8.py | Loading commit data... | |
test_topi_l2norm.py | Loading commit data... | |
test_topi_lrn.py | Loading commit data... | |
test_topi_math.py | Loading commit data... | |
test_topi_matmul.py | Loading commit data... | |
test_topi_pooling.py | Loading commit data... | |
test_topi_reduce.py | Loading commit data... | |
test_topi_relu.py | Loading commit data... | |
test_topi_reorg.py | Loading commit data... | |
test_topi_resize.py | Loading commit data... | |
test_topi_softmax.py | Loading commit data... | |
test_topi_sort.py | Loading commit data... | |
test_topi_sparse.py | Loading commit data... | |
test_topi_tensor.py | Loading commit data... | |
test_topi_transform.py | Loading commit data... | |
test_topi_upsampling.py | Loading commit data... | |
test_topi_util.py | Loading commit data... | |
test_topi_vision.py | Loading commit data... |