internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
_ffi | Loading commit data... | |
autotvm | Loading commit data... | |
contrib | Loading commit data... | |
exec | Loading commit data... | |
hybrid | Loading commit data... | |
relay | Loading commit data... | |
rpc | Loading commit data... | |
__init__.py | Loading commit data... | |
_api_internal.py | Loading commit data... | |
_pyversion.py | Loading commit data... | |
api.py | Loading commit data... | |
arith.py | Loading commit data... | |
attrs.py | Loading commit data... | |
build_module.py | Loading commit data... | |
codegen.py | Loading commit data... | |
container.py | Loading commit data... | |
datatype.py | Loading commit data... | |
error.py | Loading commit data... | |
expr.py | Loading commit data... | |
generic.py | Loading commit data... | |
intrin.py | Loading commit data... | |
ir_builder.py | Loading commit data... | |
ir_pass.py | Loading commit data... | |
make.py | Loading commit data... | |
module.py | Loading commit data... | |
ndarray.py | Loading commit data... | |
node.py | Loading commit data... | |
schedule.py | Loading commit data... | |
stmt.py | Loading commit data... | |
tag.py | Loading commit data... | |
target.py | Loading commit data... | |
tensor.py | Loading commit data... | |
tensor_intrin.py | Loading commit data... | |
testing.py | Loading commit data... |