- 27 Oct, 2019 1 commit
-
-
Tianqi Chen committed
-
- 24 Oct, 2019 2 commits
-
-
* [NODE][REFACTOR] Refactor reflection system in node. - Removed the old Node, Node is now just an alias of runtime::Object - Introduce ReflectionVTable, a new columnar dispatcher to support reflection - This allows us to remove vtable from most node objects - The VisitAttrs are registered via TVM_RESGITER_NODE_TYPE, they are no longer virtual. - Consolidated serialization and reflection features into node. * Explicit type qualification when calling destructor. * Fix SPIRV, more comments
Tianqi Chen committed -
* add tensor core support * avoid memory bank conflict * fix thread sync & better performance * better performance * add schedule test for conv2d * extend into BatchMatMul * support config fragment shape and layout using intrinsic * add TensorCore tutorial * add int support and fix lint * address comment * add 32*16*8 TensorCore test * fix wmma include logic
Siyuan Feng committed
-
- 23 Oct, 2019 1 commit
-
-
* [rpc] use callback func to do send & recv. don't get fd from sock as it is deprecated in java * fix java build * fix min/max macro define in windows * keep the old rpc setup for py * add doc for CallbackChannel
Yizhi Liu committed
-
- 22 Oct, 2019 1 commit
-
-
Zhi committed
-
- 21 Oct, 2019 1 commit
-
-
* [REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol. This PR removes the original node system, and make node as a subclass of Object. This is a major refactor towards a better unified runtime object system. List of changes in the refactor: - We now hide data_ field, use Downcast explicitly to get a sub-class object. - Removed the node system FFI in python. - Removed the node C API, instead use PackedFunc for list and get attrs. - Change relay::Op::set_attr_type_key(attr_key_name) to relay::Op::set_attr_type<AttrType>(). - This change was necessary because of the new Object registration mechanism. - Subsequent changes to the op registrations - The change revealed a few previous problems that is now fixed. - Patched up a few missing node type registration. - Now we will raise an error if we register object that is not registered. - The original node.h and container.h are kept in the same location. - Calling convention: kObjectHandle now equals the old kNodeHandle, kNodeHandle is removed. - IRFunctor now dispatches on ObjectRef. - Update to the new type checking API: is_type, derived_from are replaced by IsInstance. - Removed .hash member function, instead use C++ convention hasher functors. * Address review comments
Tianqi Chen committed
-
- 20 Oct, 2019 2 commits
-
-
Haichen Shen committed
-
We think it will reduce the confusion with the meaning. https://discuss.tvm.ai/t/discuss-consider-rename-vm-datatype/4339
Wei Chen committed
-
- 18 Oct, 2019 1 commit
-
-
* [Relay][Frontend][TF] Add tensor array ops * rename * delete test * Move utility function * Refactor * fix tensor array ops * fix test * fix rebase * Fix serializer bug * Improve tf convert name lookup to use prelude api * Fix lint * Fix test
Wei Chen committed
-
- 17 Oct, 2019 1 commit
-
-
* [relay][vm] Separate VM runtime with executable * Address comments * move ctx back to vm * make only vm related fields and methods protected * integrate seriliaztion/deserialization to executable * create stream
Zhi committed
-
- 16 Oct, 2019 1 commit
-
-
* [RUNTIME] Refactor object python FFI to new protocol. This is a pre-req to bring the Node system under object protocol. Most of the code reflects the current code in the Node system. - Use new instead of init so subclass can define their own constructors - Allow register via name, besides type idnex - Introduce necessary runtime C API functions - Refactored Tensor and Datatype to directly use constructor. * address review comments
Tianqi Chen committed
-
- 15 Oct, 2019 1 commit
-
-
* [RUNTIME] Introduce new object protocol. This PR introduces a new object protocol to unify the node and object. We also updated the existing runtime::vm code to make use of the new system. Update to the node will be done in a follow up PR. Other changes: - Remove object related code in json serializer as that code logic was not complete and we have a separate serializer for VM, can revisit later. * address review comment * Fix the child slot logic
Tianqi Chen committed
-
- 10 Oct, 2019 1 commit
-
-
* [Relay][VM] Fix constant folding issue in VM compiler 1. allow pass params when compile a module 2. enhance profiler robustness * remove dead code * fix lint * add get_params * fix test * don't pass params back * remove get_params * docs * move compile function to api * compile clashes with builtin name * fix compilation error * remove dead code
Wei Chen committed
-
- 08 Oct, 2019 1 commit
-
-
* Fix VM invoke with set_params * add test * tweak
Haichen Shen committed
-
- 17 Sep, 2019 1 commit
-
-
Use a hash map keyed on the descriptor set to avoid bad asymptotic behaviour.
Andrew Tulloch committed
-
- 13 Sep, 2019 1 commit
-
-
Andrew Tulloch committed
-
- 12 Sep, 2019 1 commit
-
-
This is an alternative implementation of a subset of the TVM runtime API (and graph runtime) that focuses entirely on reducing code size, at the expense of functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might be worth incrementally expanding the surface area if there's interest. The motivation for this work was seeing what the minimal useful subset of the TVM runtime is. This is relevant for e.g. super code-size constrained applications in e.g. embedded/mobile. The current runtime is more like O(100KiB) or so, so this might be compelling for some users. The smaller surface area for auditing might make this relevant for https://github.com/dmlc/tvm/issues/3159, or the usecases I was thinking about in https://github.com/dmlc/tvm/issues/2523#issuecomment-459165815 re: the Rust runtime. The symbols in the tvm::minimalruntime space (i.e. excluding std:: and picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we could replace picojson:: with [`jsmn`](https://zserge.com/jsmn.html) or something, and we could replace more of the `std::unordered_map` usage, etc with custom primitives as well (similar to the `DynArray`).
Andrew Tulloch committed
-
- 03 Sep, 2019 2 commits
-
-
This reverts commit 224cc243.
Tianqi Chen committed -
As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters
Yong Sun committed
-
- 02 Sep, 2019 1 commit
-
-
Logan Weber committed
-
- 01 Sep, 2019 1 commit
-
-
* init shape func in interpreter and vm compiler * Update interpreter * fix * lint * lint * fix * remove hack * update * fix * fix * update * address comments & update for shape_of * fix lint * update * fix hybrid * lint * fix bug & add take shape func * lint * lint * update * fix flaky test * add todo
Haichen Shen committed
-
- 29 Aug, 2019 1 commit
-
-
hlu1 committed
-
- 21 Aug, 2019 1 commit
-
-
* [Relay][VM]VM debugger * Report mean/min/max for op duration * Typos * Lint * Lint * Lint * Support build debug VM in CMake * Lint * Enable VM debug in unit test * Disable debug vm test until new docker image is built * Add device sync code * Fix qnn unit test * Disable vm debug by default * Rename files * Rename classes * Fix comment * Fix comment
Wei Chen committed
-
- 01 Aug, 2019 1 commit
-
-
* [Relay][VM] Support execution on devices * Reduce Copy calls * Cleanup * Lint * CR comments * Merge test into test_vm.py
Wei Chen committed
-
- 31 Jul, 2019 1 commit
-
-
* relay vm serialization * fix lint * load params, fix stream * lint * fix typo
Zhi committed
-
- 30 Jul, 2019 2 commits
-
-
...and add rocm module_save to the tests.
Thomas Viehmann committed -
Thomas Viehmann committed
-
- 25 Jul, 2019 2 commits
-
-
* uTVM interfaces (#14) * some minor interface changes * implemented HostLowLevelDevice * added MicroDeviceAPI * implemented micro_common and added Python interfaces * current status, semi implemented micro session * added micro_common implementation and python interfaces (#18) * added micro_common implementation and python interfaces (#18) * current status, semi implemented * host test working * updated interfaces for MicroSession arguments allocation * make somewhat lint compatible * fix based on comments * added rounding macro * fix minor bug * improvements based on comments * Clean up `binutil.py` and make Python-3-compatible * Change argument allocation design * Address feedback and lint errors * Improve binutil tests * Simplify allocator (per @tqchen's suggestions) * Doc/style fixes * farts * mcgee * rodata section werks (and so does `test_runtime_micro_workspace.py`) * simple graph runtime werk * TEMP * ResNet works, yo * First round of cleanup * More cleanup * runs a dyson over the code * Another pass * Fix `make lint` issues * ready to pr... probably * final * Undo change * Fix rebase resolution * Minor fixes * Undo changes to C codegen tests * Add `obj_path` in `create_micro_lib` * TEMP * Address feedback * Add missing TODO * Partially address feedback * Fix headers * Switch to enum class for `SectionKind` * Add missing ASF header * Fix lint * Fix lint again * Fix lint * Kill lint warnings * Address feedback * Change Python interface to MicroTVM All interaction with the device is now through `Session` objects, which are used through Python's `with` blocks. * Reorder LowLevelDevice interface * Store shared ptr to session in all alloced objects * Move helper functions out of `tvm.micro` * Switch static char arr to vector * Improve general infra and code quality Does not yet address all of tqchen's feedback * Forgot a rename * Fix lint * Add ASF header * Fix lint * Partially address MarisaKirisame's feedback * Lint * Expose `MicroSession` as a node to Python * Revert to using `Session` constructor * Fix compiler error * (Maybe) fix CI error * Debugging * Remove * Quell lint * Switch to stack-based session contexts * Make uTVM less intrusive to host codegen And use SSA for operands of generated ternary operators * Inline UTVMArgs into UTVMTask struct * Remove `HostLowLevelDevice` header * Remove `BaseAddr` class * Address feedback * Add "utvm" prefix to global vars in runtime * Fix lint * Fix CI * Fix `test_binutil.py` * Fix submodules * Remove ResNet tests * Make `test_binutil.py` work with nose * Fix CI * I swear this actually fixes the binutil tests * lint * lint * Add fcompile-compatible cross-compile func * Add docs for uTVM runtime files * Move pointer patching into `MicroSession` * Fix lint * First attempt at unifying cross-compile APIs * Fix lint * Rename `cross_compile` back to `cc` * Address feedback * Remove commented code * Lint * Figure out failing function * Remove debugging code * Change "micro_dev" target to "micro" * Add checks in tests for whether uTVM is enabled * Add TODO for 32-bit support * Rename more "micro_dev" to "micro" * Undo rename We already have `tvm.micro` as a namespace. Can't have it as a method as well. * Fix failing CI Thanks to @tqchen for finding this bug. Emitting ternary operators for `min` and `max` causes concurrency bugs in CUDA, so we're moving the ternary op emissions from `CodeGenC` to `CodeGenCHost`. * Address feedback * Fix lint
Logan Weber committed -
Philip Hyunsu Cho committed
-
- 23 Jul, 2019 1 commit
-
-
In cases where we have multiple models or threadpools active, spinning around `sched_yield()` may not be desirable, as it prevents the OS from effectively scheduling other threads. Thus, allow users to conditionally disable this behaviour (via an environment variable `TVM_THREAD_POOL_SPIN_COUNT`, similar to existing environment flags for the thread pool such as `TVM_BIND_THREADS`, etc). This substantially improves tail latencies in some of our multi-tenant workloads in practice. Unit tests have been added - on my laptop, running: ``` TVM_THREAD_POOL_SPIN_COUNT=0 ./build/threading_backend_test; TVM_THREAD_POOL_SPIN_COUNT=1 ./build/threading_backend_test; ./build/threading_backend_test; ``` gives https://gist.github.com/ajtulloch/1805ca6cbaa27f5d442d23f9d0021ce6 (i.e. 97ms -> <1ms after this change)
Andrew Tulloch committed
-
- 16 Jul, 2019 1 commit
-
-
* tmp * Port vm and object to python * clean up * update vm build module * update * x * tweak * cleanup * update * fix rebase * Rename to VMCompiler * fix
Haichen Shen committed
-
- 15 Jul, 2019 1 commit
-
-
* Enable set_input_zero_copy in GraphRuntime * Fix LoadParams * Fix * lint * Fix remote context issue * Fix * Remove LOG * Remove unused variables * Add tests * works * More test scenarios * make it simpler * Remove unnecessary changes * Address comments * More comments * Address comments * Fix build
Yinghai Lu committed
-
- 11 Jul, 2019 1 commit
-
-
hlu1 committed
-
- 10 Jul, 2019 1 commit
-
-
* Implement type checking for Any Remove code generation related changes Remove compile changes Remove more Remove unification hack Add some code back that was needed, and clean up test Refactor test cases WIP Implement TypeHint AST Add test case which should fail Remove unification changes, and fix bug with let rec Restore unification for shapes Improve error reporting while debugging All examples type check All examples type check WIP First version that works with hints, needs clean up Remove dead code Tweaks Remove type hint Remove unecessary type hint stuff Remove more type hints Clean up Expose Any expression node Address CR Fix Fix solver Kill unecessary code Fix PyLint Fix Relocate loops Fix license and test Lint again Lint again Fix loops Fix docstring Fix template error Fix compiler issue Fix compile err Remove more runtime changes Restore buffer Fix segfault Fix Fix arange * Address feedback * Fix typo * Fix arange * Fix op level3 * Fix issue with Python wrapper
Jared Roesch committed
-
- 09 Jul, 2019 2 commits
-
-
* [Relay][VM]Compiling pattern matching * Fix lint * Remove debug code * Move TreeNode definition * merge ifi and selecti, todo: remove them * fix lint * remove ifi and selecti * rename GetTagi to GetTag * fix dltype * fix more dltype * Generalize If and select, and rename to Ifi and Selecti * Fix lint * Rename Ifi to If * Change register default to match value * Remove bad specialization for Move * Stop use Select * Remove Select * TreeNode refactor * Change entry_func name * Remove Cmp due to rebase issue
Wei Chen committed -
* Added bool to float conversion support to spirv ir builder. * Added unittest for vulkan bool conversion. * Typo fix.
Josh Fromm committed
-
- 27 Jun, 2019 1 commit
-
-
Li committed
-
- 25 Jun, 2019 1 commit
-
-
Summary: In multi-threaded applications where we have multiple inferences on the same model in parallel (consider e.g. a TTS system handling multiple requests), it can be useful to share the parameters of a model amongst these multiple instances. This improves the cache utilization behaviour of the system, as multiple cores can use the same set of weights instead of evicting the identical copies of weights in a shared cache. As the underlying `NDArray` instances in `data_entry_` implement a ref-counted based sharing system, this is a simple modification of the `GraphRuntime::LoadParams` logic to instead copy parameters from an existing GraphRuntime instance. This is a little ugly in that we need both the pre-existing GraphRuntime instance, as well as the 'serialized' params (since we need to know the set of names we should copy), but without imposing additional assumptions (i.e. storing the set of param names in GraphRuntime, and enforcing that shared param names are identical to the parameters set in the preceding `LoadParams` call), this seems unavoidable. Test Plan: Unit test added.
Andrew Tulloch committed
-
- 14 Jun, 2019 1 commit
-
-
* Update vm print & add AllocTensor instruction * patch * fix invoke packed * update cmake * tweak move * update invoke_closure * lint * add doc * tweak
Haichen Shen committed
-
- 23 May, 2019 1 commit
-
-
hlu1 committed
-