1. 18 Nov, 2019 1 commit
  2. 16 Nov, 2019 1 commit
  3. 15 Nov, 2019 3 commits
  4. 11 Nov, 2019 1 commit
  5. 10 Nov, 2019 1 commit
  6. 01 Nov, 2019 1 commit
  7. 27 Oct, 2019 1 commit
  8. 24 Oct, 2019 2 commits
    • [NODE][REFACTOR] Refactor reflection system in node. (#4189) · 78ca6fc8
      * [NODE][REFACTOR] Refactor reflection system in node.
      
      - Removed the old Node, Node is now just an alias of runtime::Object
      - Introduce ReflectionVTable, a new columnar dispatcher to support reflection
        - This allows us to remove vtable from most node objects
        - The VisitAttrs are registered via TVM_RESGITER_NODE_TYPE,
          they are no longer virtual.
      - Consolidated serialization and reflection features into node.
      
      * Explicit type qualification when calling destructor.
      
      * Fix SPIRV, more comments
      Tianqi Chen committed
    • TensorCore Support using Intrinsic (#4136) · 324a9607
      * add tensor core support
      
      * avoid memory bank conflict
      
      * fix thread sync & better performance
      
      * better performance
      
      * add schedule test for conv2d
      
      * extend into BatchMatMul
      
      * support config fragment shape and layout using intrinsic
      
      * add TensorCore tutorial
      
      * add int support and fix lint
      
      * address comment
      
      * add 32*16*8 TensorCore test
      
      * fix wmma include logic
      Siyuan Feng committed
  9. 23 Oct, 2019 1 commit
  10. 22 Oct, 2019 1 commit
  11. 21 Oct, 2019 1 commit
    • [REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol. (#4161) · 7895adb2
      * [REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol.
      
      This PR removes the original node system, and make node as a subclass of Object.
      This is a major refactor towards a better unified runtime object system.
      
      List of changes in the refactor:
      
      - We now hide data_ field, use Downcast explicitly to get a sub-class object.
      - Removed the node system FFI in python.
      - Removed the node C API, instead use PackedFunc for list and get attrs.
      - Change relay::Op::set_attr_type_key(attr_key_name) to relay::Op::set_attr_type<AttrType>().
        - This change was necessary because of the new Object registration mechanism.
        - Subsequent changes to the op registrations
        - The change revealed a few previous problems that is now fixed.
      - Patched up a few missing node type registration.
        - Now we will raise an error if we register object that is not registered.
      - The original node.h and container.h are kept in the same location.
      - Calling convention: kObjectHandle now equals the old kNodeHandle, kNodeHandle is removed.
      - IRFunctor now dispatches on ObjectRef.
      - Update to the new type checking API: is_type, derived_from are replaced by IsInstance.
      - Removed .hash member function, instead use C++ convention hasher functors.
      
      * Address review comments
      Tianqi Chen committed
  12. 20 Oct, 2019 2 commits
  13. 18 Oct, 2019 1 commit
  14. 17 Oct, 2019 1 commit
  15. 16 Oct, 2019 1 commit
    • [RUNTIME] Refactor object python FFI to new protocol. (#4128) · 02c1e117
      * [RUNTIME] Refactor object python FFI to new protocol.
      
      This is a pre-req to bring the Node system under object protocol.
      Most of the code reflects the current code in the Node system.
      
      - Use new instead of init so subclass can define their own constructors
      - Allow register via name, besides type idnex
      - Introduce necessary runtime C API functions
      - Refactored Tensor and Datatype to directly use constructor.
      
      * address review comments
      Tianqi Chen committed
  16. 15 Oct, 2019 1 commit
    • [RFC][RUNTIME] Introduce new object protocol. (#4115) · a0bd3786
      * [RUNTIME] Introduce new object protocol.
      
      This PR introduces a new object protocol to unify the node and object.
      We also updated the existing runtime::vm code to make use of the new system.
      
      Update to the node will be done in a follow up PR.
      
      Other changes:
      
      - Remove object related code in json serializer as that code logic was not complete
        and we have a separate serializer for VM, can revisit later.
      
      * address review  comment
      
      * Fix the child slot logic
      Tianqi Chen committed
  17. 10 Oct, 2019 1 commit
    • [Relay][VM] Fix constant folding issue in VM compiler (#4077) · fc2713e5
      * [Relay][VM] Fix constant folding issue in VM compiler
      
      1. allow pass params when compile a module
      2. enhance profiler robustness
      
      * remove dead code
      
      * fix lint
      
      * add get_params
      
      * fix test
      
      * don't pass params back
      
      * remove get_params
      
      * docs
      
      * move compile function to api
      
      * compile clashes with builtin name
      
      * fix compilation error
      
      * remove dead code
      Wei Chen committed
  18. 08 Oct, 2019 1 commit
  19. 17 Sep, 2019 1 commit
  20. 13 Sep, 2019 1 commit
  21. 12 Sep, 2019 1 commit
    • [RFC] [Contrib] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567) · 1de52bb0
      This is an alternative implementation of a subset of the TVM runtime API (and
      graph runtime) that focuses entirely on reducing code size, at the expense of
      functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might
      be worth incrementally expanding the surface area if there's interest.
      
      The motivation for this work was seeing what the minimal useful subset of the
      TVM runtime is. This is relevant for e.g. super code-size constrained
      applications in e.g. embedded/mobile. The current runtime is more like O(100KiB)
      or so, so this might be compelling for some users.
      
      The smaller surface area for auditing might make this relevant for
      https://github.com/dmlc/tvm/issues/3159, or the usecases I was thinking about in
      https://github.com/dmlc/tvm/issues/2523#issuecomment-459165815 re: the Rust
      runtime.
      
      The symbols in the tvm::minimalruntime space (i.e. excluding std:: and
      picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we
      could replace picojson:: with [`jsmn`](https://zserge.com/jsmn.html) or
      something, and we could replace more of the `std::unordered_map` usage, etc with
      custom primitives as well (similar to the `DynArray`).
      Andrew Tulloch committed
  22. 03 Sep, 2019 2 commits
  23. 02 Sep, 2019 1 commit
  24. 01 Sep, 2019 1 commit
  25. 29 Aug, 2019 1 commit
  26. 21 Aug, 2019 1 commit
    • [Relay][VM]VM Profiler (#3727) · 95f12e31
      * [Relay][VM]VM debugger
      
      * Report mean/min/max for op duration
      
      * Typos
      
      * Lint
      
      * Lint
      
      * Lint
      
      * Support build debug VM in CMake
      
      * Lint
      
      * Enable VM debug in unit test
      
      * Disable debug vm test until new docker image is built
      
      * Add device sync code
      
      * Fix qnn unit test
      
      * Disable vm debug by default
      
      * Rename files
      
      * Rename classes
      
      * Fix comment
      
      * Fix comment
      Wei Chen committed
  27. 01 Aug, 2019 1 commit
  28. 31 Jul, 2019 1 commit
  29. 30 Jul, 2019 2 commits
  30. 25 Jul, 2019 2 commits
    • Implementation of uTVM (#3227) · ef909df1
      * uTVM interfaces (#14)
      
      * some minor interface changes
      
      * implemented HostLowLevelDevice
      
      * added MicroDeviceAPI
      
      * implemented micro_common and added Python interfaces
      
      * current status, semi implemented micro session
      
      * added micro_common implementation and python interfaces (#18)
      
      * added micro_common implementation and python interfaces (#18)
      
      * current status, semi implemented
      
      * host test working
      
      * updated interfaces for MicroSession arguments allocation
      
      * make somewhat lint compatible
      
      * fix based on comments
      
      * added rounding macro
      
      * fix minor bug
      
      * improvements based on comments
      
      * Clean up `binutil.py` and make Python-3-compatible
      
      * Change argument allocation design
      
      * Address feedback and lint errors
      
      * Improve binutil tests
      
      * Simplify allocator (per @tqchen's suggestions)
      
      * Doc/style fixes
      
      * farts
      
      * mcgee
      
      * rodata section werks
      
      (and so does `test_runtime_micro_workspace.py`)
      
      * simple graph runtime werk
      
      * TEMP
      
      * ResNet works, yo
      
      * First round of cleanup
      
      * More cleanup
      
      * runs a dyson over the code
      
      * Another pass
      
      * Fix `make lint` issues
      
      * ready to pr... probably
      
      * final
      
      * Undo change
      
      * Fix rebase resolution
      
      * Minor fixes
      
      * Undo changes to C codegen tests
      
      * Add `obj_path` in `create_micro_lib`
      
      * TEMP
      
      * Address feedback
      
      * Add missing TODO
      
      * Partially address feedback
      
      * Fix headers
      
      * Switch to enum class for `SectionKind`
      
      * Add missing ASF header
      
      * Fix lint
      
      * Fix lint again
      
      * Fix lint
      
      * Kill lint warnings
      
      * Address feedback
      
      * Change Python interface to MicroTVM
      
      All interaction with the device is now through `Session` objects, which
      are used through Python's `with` blocks.
      
      * Reorder LowLevelDevice interface
      
      * Store shared ptr to session in all alloced objects
      
      * Move helper functions out of `tvm.micro`
      
      * Switch static char arr to vector
      
      * Improve general infra and code quality
      
      Does not yet address all of tqchen's feedback
      
      * Forgot a rename
      
      * Fix lint
      
      * Add ASF header
      
      * Fix lint
      
      * Partially address MarisaKirisame's feedback
      
      * Lint
      
      * Expose `MicroSession` as a node to Python
      
      * Revert to using `Session` constructor
      
      * Fix compiler error
      
      * (Maybe) fix CI error
      
      * Debugging
      
      * Remove
      
      * Quell lint
      
      * Switch to stack-based session contexts
      
      * Make uTVM less intrusive to host codegen
      
      And use SSA for operands of generated ternary operators
      
      * Inline UTVMArgs into UTVMTask struct
      
      * Remove `HostLowLevelDevice` header
      
      * Remove `BaseAddr` class
      
      * Address feedback
      
      * Add "utvm" prefix to global vars in runtime
      
      * Fix lint
      
      * Fix CI
      
      * Fix `test_binutil.py`
      
      * Fix submodules
      
      * Remove ResNet tests
      
      * Make `test_binutil.py` work with nose
      
      * Fix CI
      
      * I swear this actually fixes the binutil tests
      
      * lint
      
      * lint
      
      * Add fcompile-compatible cross-compile func
      
      * Add docs for uTVM runtime files
      
      * Move pointer patching into `MicroSession`
      
      * Fix lint
      
      * First attempt at unifying cross-compile APIs
      
      * Fix lint
      
      * Rename `cross_compile` back to `cc`
      
      * Address feedback
      
      * Remove commented code
      
      * Lint
      
      * Figure out failing function
      
      * Remove debugging code
      
      * Change "micro_dev" target to "micro"
      
      * Add checks in tests for whether uTVM is enabled
      
      * Add TODO for 32-bit support
      
      * Rename more "micro_dev" to "micro"
      
      * Undo rename
      
      We already have `tvm.micro` as a namespace.  Can't have it as a method
      as well.
      
      * Fix failing CI
      
      Thanks to @tqchen for finding this bug.  Emitting ternary operators for
      `min` and `max` causes concurrency bugs in CUDA, so we're moving the
      ternary op emissions from `CodeGenC` to `CodeGenCHost`.
      
      * Address feedback
      
      * Fix lint
      Logan Weber committed
  31. 23 Jul, 2019 1 commit
    • [Runtime] [ThreadPool] Make SpscTaskQueue::Pop(..) spin_count configurable (#3577) · 9b1c2e08
      In cases where we have multiple models or threadpools active, spinning around
      `sched_yield()` may not be desirable, as it prevents the OS from effectively
      scheduling other threads.
      
      Thus, allow users to conditionally disable this behaviour (via an environment
      variable `TVM_THREAD_POOL_SPIN_COUNT`, similar to existing environment flags for
      the thread pool such as `TVM_BIND_THREADS`, etc).
      
      This substantially improves tail latencies in some of our multi-tenant
      workloads in practice.
      
      Unit tests have been added - on my laptop, running:
      
      ```
      TVM_THREAD_POOL_SPIN_COUNT=0 ./build/threading_backend_test;
      TVM_THREAD_POOL_SPIN_COUNT=1 ./build/threading_backend_test;
      ./build/threading_backend_test;
      ```
      
      gives https://gist.github.com/ajtulloch/1805ca6cbaa27f5d442d23f9d0021ce6 (i.e.
      97ms -> <1ms after this change)
      Andrew Tulloch committed
  32. 16 Jul, 2019 1 commit
  33. 15 Jul, 2019 1 commit