- 20 Jan, 2020 1 commit
-
-
* [REFACTOR][TYPE] Finish move all types to IR. - Move definition of Ref and TensorType to ir - Move type_functor.h to public header. - Rename RefType -> RelayRefType for clarity. * Add atol
Tianqi Chen committed
-
- 19 Jan, 2020 1 commit
-
-
TIR is the new namespace for low-level IR for tensor-level optimizations and loop transformations. This PR establishes the namespace and files. - lowered_func.h,buffer.h,data_layout.h -> tir/buffer.h,tir/data_layout.h,tir/lowered_func.h - ir.h -> tir/expr.h, tir/stmt.h - ir_functor_ext.h -> tir/expr_functor.h, tir/stmt_functor.h
Tianqi Chen committed
-
- 14 Jan, 2020 1 commit
-
-
- Use consistent constructor style to construct objects. - Move env_func to ir as it is mainly used to construct IRs. - Make docs consistent.
Tianqi Chen committed
-
- 04 Jan, 2020 1 commit
-
-
TVM_REGSISTER_API is an alias of TVM_REGISTER_GLOBAL. In the spirit of simplify redirections, this PR removes the original TVM_REGISTER_API macro and directly use TVM_REGISTER_GLOBAL. This type of refactor will also simplify the IDE navigation tools such as FFI navigator to provide better code reading experiences. Move EnvFunc's definition to node.
Tianqi Chen committed
-
- 31 Dec, 2019 1 commit
-
-
* [REFACTOR][OBJECT] Consoldiate NodePtr/Ref/Hash/Equal and macros to Object. Historically, we have classes like NodePtr/Ref/HashEqual. After unified object protocol, these names are just alias of the object counterpart. Moreover, there are helper macros defined over the places for defining these object. This PR consoldiate the terminologies into the corresponding ones in the Object system so we have a clean and consistent API moving forward. * Update include/tvm/attrs.h Co-Authored-By: Wei Chen <ipondering.weic@gmail.com> * fix compilation Co-authored-by: Wei Chen <ipondering.weic@gmail.com>
Tianqi Chen committed
-
- 26 Dec, 2019 1 commit
-
-
Animesh Jain committed
-
- 24 Nov, 2019 1 commit
-
-
* [LINT] Improve the check tool to handle ASF copyright message. * [LINT] Remove unnecessary copyright message as per ASF requirement. * Fix codegen hybrid * [LINT] Broaden license checks to include html, xml * [LINT] Fix rest of the files * Fix notice * [LINT] Improve check file type error message
Tianqi Chen committed
-
- 21 Oct, 2019 1 commit
-
-
* [REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol. This PR removes the original node system, and make node as a subclass of Object. This is a major refactor towards a better unified runtime object system. List of changes in the refactor: - We now hide data_ field, use Downcast explicitly to get a sub-class object. - Removed the node system FFI in python. - Removed the node C API, instead use PackedFunc for list and get attrs. - Change relay::Op::set_attr_type_key(attr_key_name) to relay::Op::set_attr_type<AttrType>(). - This change was necessary because of the new Object registration mechanism. - Subsequent changes to the op registrations - The change revealed a few previous problems that is now fixed. - Patched up a few missing node type registration. - Now we will raise an error if we register object that is not registered. - The original node.h and container.h are kept in the same location. - Calling convention: kObjectHandle now equals the old kNodeHandle, kNodeHandle is removed. - IRFunctor now dispatches on ObjectRef. - Update to the new type checking API: is_type, derived_from are replaced by IsInstance. - Removed .hash member function, instead use C++ convention hasher functors. * Address review comments
Tianqi Chen committed
-
- 06 Aug, 2019 1 commit
-
-
* add build gcn tutorial * add transpose operator for square sparse matrices * remove extra files * change loop tag * comply with lint * comply with lint -- line too long * comply with lint * lint check * lint check * lint check * apply marisa and theirry's reviews
Yulun Yao committed
-
- 23 Jul, 2019 1 commit
-
-
internally and externally, interested in replacing standard dense layers with block-sparse matrix multiplication layers. The motivations are generally: higher performance (due to reduction in FLOPs, memory bandwidth/cache footprint), enabling larger models (e.g. fitting more layers in a given memory budget). Some public work along these lines: * https://openai.com/blog/block-sparse-gpu-kernels/ * https://openai.com/blog/sparse-transformer/ * https://arxiv.org/abs/1802.08435 * https://arxiv.org/abs/1711.02782 Various groups have been able to successfully train models with reasonable levels of sparsity (90%+) with marginal accuracy changes, which suggests substantial speedups are possible (as this implies a >10x reduction in FLOPs). It is fairly straightforward to realize these theoretical speedups, see e.g. TVM benchmarks for Intel CPUs in https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902, and CUDA results in https://github.com/openai/blocksparse, etc. * https://github.com/openai/blocksparse (CUDA) * https://software.intel.com/en-us/mkl-developer-reference-c-mkl-bsrmm (MKL BSRM) * https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.bsr_matrix.html (SCIPY BSR representation) This is extracted from an internal patch we've been using internally. There are various extensions possible (int8/fp16/bf16, CUDA/other GPU architectures), but this is a reasonable starting point. This needs more thorough unit test coverage however. We follow the conventions established by scipy.sparse.bsr_matrix and other libraries, see the unit tests for details. For folks interested in experimenting with scheduling/AutoTVM etc, https://gist.github.com/ajtulloch/e65f90487bceb8848128e8db582fe902 is a useful starting point.
Andrew Tulloch committed
-