internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
node | Loading commit data... | |
relay | Loading commit data... | |
runtime | Loading commit data... | |
api_registry.h | Loading commit data... | |
arithmetic.h | Loading commit data... | |
attrs.h | Loading commit data... | |
base.h | Loading commit data... | |
buffer.h | Loading commit data... | |
build_module.h | Loading commit data... | |
c_dsl_api.h | Loading commit data... | |
channel.h | Loading commit data... | |
codegen.h | Loading commit data... | |
data_layout.h | Loading commit data... | |
dtype.h | Loading commit data... | |
expr.h | Loading commit data... | |
expr_operator.h | Loading commit data... | |
ir.h | Loading commit data... | |
ir_functor_ext.h | Loading commit data... | |
ir_mutator.h | Loading commit data... | |
ir_pass.h | Loading commit data... | |
ir_visitor.h | Loading commit data... | |
logging.h | Loading commit data... | |
lowered_func.h | Loading commit data... | |
operation.h | Loading commit data... | |
packed_func_ext.h | Loading commit data... | |
schedule.h | Loading commit data... | |
schedule_pass.h | Loading commit data... | |
target_info.h | Loading commit data... | |
tensor.h | Loading commit data... | |
tensor_intrin.h | Loading commit data... |