Commits · d745d93551154945609c955ac062bb1cdbcf0b7a · wenyuanbo / tic

18 Nov, 2019 1 commit
- reminding message for TVM_REGISTER_NODE_TYPE (#4365) · 7efb72e6
  Yizhi Liu committed 5 years ago
  
  7efb72e6 Browse Directory
16 Nov, 2019 1 commit
- proper device query through rocm api (#4305) · 022b285d
  Peter Yeh committed 5 years ago
  
  022b285d Browse Directory
15 Nov, 2019 3 commits
- [RUNTIME] Add device query for AMD GcnArch (#4341) · 0235d283
```
* add gcnArch query

* kGcnArch query for cuda is a no-op
```
  Peter Yeh committed 5 years ago
  0235d283 Browse Directory
- [Contrib] Add MKL DNN option (#4323) · 72821b20
```
* [Contrib] Add MKL DNN

* update

* update
```
  Haichen Shen committed 5 years ago
  72821b20 Browse Directory
- Enable hipModuleGetGlobal() (#4321) · 5b9f459d
  Peter Yeh committed 5 years ago
  
  5b9f459d Browse Directory
11 Nov, 2019 1 commit

[RUNTIME][REFACTOR] Use object protocol to support runtime::Module (#4289) · f823c577

Previously runtime::Module was supported using shared_ptr.
This PR refactors the codebase to use the Object protocol.

It will open doors to allow easier interpolation between
Object containers and module in the future.

committed 5 years ago

f823c577 Browse Directory

10 Nov, 2019 1 commit
- [RUTNIME] Support C++ RPC (#4281) · d2fc0252
  Zhao Wu committed 5 years ago
  
  d2fc0252 Browse Directory
01 Nov, 2019 1 commit
- Implement explicit IR representation of memory alloction (#3560) · 2083513f
  Jared Roesch committed 5 years ago
  
  2083513f Browse Directory
27 Oct, 2019 1 commit
- [RUNTIME] Separate runtime related contrib into runtime/contrib (#4207) · dcc6af53
  Tianqi Chen committed 5 years ago
  
  dcc6af53 Browse Directory
24 Oct, 2019 2 commits

[NODE][REFACTOR] Refactor reflection system in node. (#4189) · 78ca6fc8

* [NODE][REFACTOR] Refactor reflection system in node.

- Removed the old Node, Node is now just an alias of runtime::Object
- Introduce ReflectionVTable, a new columnar dispatcher to support reflection
  - This allows us to remove vtable from most node objects
  - The VisitAttrs are registered via TVM_RESGITER_NODE_TYPE,
    they are no longer virtual.
- Consolidated serialization and reflection features into node.

* Explicit type qualification when calling destructor.

* Fix SPIRV, more comments

committed 5 years ago

78ca6fc8 Browse Directory

TensorCore Support using Intrinsic (#4136) · 324a9607

* add tensor core support

* avoid memory bank conflict

* fix thread sync & better performance

* better performance

* add schedule test for conv2d

* extend into BatchMatMul

* support config fragment shape and layout using intrinsic

* add TensorCore tutorial

* add int support and fix lint

* address comment

* add 32*16*8 TensorCore test

* fix wmma include logic

committed 5 years ago

324a9607 Browse Directory

23 Oct, 2019 1 commit

[rpc] use callback func to do send & recv (#4147) · 5408d3a3

* [rpc] use callback func to do send & recv. don't get fd from sock as it is deprecated in java

* fix java build

* fix min/max macro define in windows

* keep the old rpc setup for py

* add doc for CallbackChannel

committed 5 years ago

5408d3a3 Browse Directory

22 Oct, 2019 1 commit
- [relay][vm] Reuse allocated device memory (#4170) · 5a177070
  Zhi committed 5 years ago
  
  5a177070 Browse Directory
21 Oct, 2019 1 commit

[REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol. (#4161) · 7895adb2

* [REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol.

This PR removes the original node system, and make node as a subclass of Object.
This is a major refactor towards a better unified runtime object system.

List of changes in the refactor:

- We now hide data_ field, use Downcast explicitly to get a sub-class object.
- Removed the node system FFI in python.
- Removed the node C API, instead use PackedFunc for list and get attrs.
- Change relay::Op::set_attr_type_key(attr_key_name) to relay::Op::set_attr_type<AttrType>().
  - This change was necessary because of the new Object registration mechanism.
  - Subsequent changes to the op registrations
  - The change revealed a few previous problems that is now fixed.
- Patched up a few missing node type registration.
  - Now we will raise an error if we register object that is not registered.
- The original node.h and container.h are kept in the same location.
- Calling convention: kObjectHandle now equals the old kNodeHandle, kNodeHandle is removed.
- IRFunctor now dispatches on ObjectRef.
- Update to the new type checking API: is_type, derived_from are replaced by IsInstance.
- Removed .hash member function, instead use C++ convention hasher functors.

* Address review comments

committed 5 years ago

7895adb2 Browse Directory

20 Oct, 2019 2 commits
- [Runtime] Enable option to use OpenMP thread pool (#4089) · 97ea31c8
  Haichen Shen committed 5 years ago
  
  97ea31c8 Browse Directory
- [Refactor] Rename Datatype to ADT (#4156) · 32aad56c
```
We think it will reduce the confusion with the meaning.

https://discuss.tvm.ai/t/discuss-consider-rename-vm-datatype/4339
```
  Wei Chen committed 5 years ago
  32aad56c Browse Directory
18 Oct, 2019 1 commit

[Relay][Frontend][TF] Add tensor array ops (#3798) · 36a96773

* [Relay][Frontend][TF] Add tensor array ops

* rename

* delete test

* Move utility function

* Refactor

* fix tensor array ops

* fix test

* fix rebase

* Fix serializer bug

* Improve tf convert name lookup to use prelude api

* Fix lint

* Fix test

committed 5 years ago

36a96773 Browse Directory

17 Oct, 2019 1 commit

[relay][vm] Separate VM runtime with executable (#4100) · 4052de6d

* [relay][vm] Separate VM runtime with executable

* Address comments

* move ctx back to vm

* make only vm related fields and methods protected

* integrate seriliaztion/deserialization to executable

* create stream

committed 5 years ago

4052de6d Browse Directory

16 Oct, 2019 1 commit

[RUNTIME] Refactor object python FFI to new protocol. (#4128) · 02c1e117

* [RUNTIME] Refactor object python FFI to new protocol.

This is a pre-req to bring the Node system under object protocol.
Most of the code reflects the current code in the Node system.

- Use new instead of init so subclass can define their own constructors
- Allow register via name, besides type idnex
- Introduce necessary runtime C API functions
- Refactored Tensor and Datatype to directly use constructor.

* address review comments

committed 5 years ago

02c1e117 Browse Directory

15 Oct, 2019 1 commit

[RFC][RUNTIME] Introduce new object protocol. (#4115) · a0bd3786

* [RUNTIME] Introduce new object protocol.

This PR introduces a new object protocol to unify the node and object.
We also updated the existing runtime::vm code to make use of the new system.

Update to the node will be done in a follow up PR.

Other changes:

- Remove object related code in json serializer as that code logic was not complete
  and we have a separate serializer for VM, can revisit later.

* address review  comment

* Fix the child slot logic

committed 5 years ago

a0bd3786 Browse Directory

10 Oct, 2019 1 commit

[Relay][VM] Fix constant folding issue in VM compiler (#4077) · fc2713e5

* [Relay][VM] Fix constant folding issue in VM compiler

1. allow pass params when compile a module
2. enhance profiler robustness

* remove dead code

* fix lint

* add get_params

* fix test

* don't pass params back

* remove get_params

* docs

* move compile function to api

* compile clashes with builtin name

* fix compilation error

* remove dead code

committed 5 years ago

fc2713e5 Browse Directory

08 Oct, 2019 1 commit
- [Fix][VM] Fix VM invoke with set_params (#4079) · b5bcdbb0
```
* Fix VM invoke with set_params

* add test

* tweak
```
  Haichen Shen committed 5 years ago
  b5bcdbb0 Browse Directory
17 Sep, 2019 1 commit
- [Vulkan] Minor optimization for deferred token lookups. (#3960) · 1fe17d14
```
Use a hash map keyed on the descriptor set to avoid bad asymptotic behaviour.
```
  Andrew Tulloch committed 5 years ago
  1fe17d14 Browse Directory
13 Sep, 2019 1 commit
- Vulkan2 Runtime API (#3849) · 2536465c
  Andrew Tulloch committed 5 years ago
  
  2536465c Browse Directory
12 Sep, 2019 1 commit

[RFC] [Contrib] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567) · 1de52bb0

This is an alternative implementation of a subset of the TVM runtime API (and
graph runtime) that focuses entirely on reducing code size, at the expense of
functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might
be worth incrementally expanding the surface area if there's interest.

The motivation for this work was seeing what the minimal useful subset of the
TVM runtime is. This is relevant for e.g. super code-size constrained
applications in e.g. embedded/mobile. The current runtime is more like O(100KiB)
or so, so this might be compelling for some users.

The smaller surface area for auditing might make this relevant for
https://github.com/dmlc/tvm/issues/3159, or the usecases I was thinking about in
https://github.com/dmlc/tvm/issues/2523#issuecomment-459165815 re: the Rust
runtime.

The symbols in the tvm::minimalruntime space (i.e. excluding std:: and
picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we
could replace picojson:: with [`jsmn`](https://zserge.com/jsmn.html) or
something, and we could replace more of the `std::unordered_map` usage, etc with
custom primitives as well (similar to the `DynArray`).

committed 5 years ago

1de52bb0 Browse Directory

03 Sep, 2019 2 commits

Revert "[Runtime] Allow parameter sharing between modules (#3489)" (#3884) · 6b0359b4
```
This reverts commit 224cc243.
```
Tianqi Chen committed 5 years ago
6b0359b4 Browse Directory

[Runtime] Allow parameter sharing between modules (#3489) · 224cc243

As GraphRuntime does not provide control-flow logics, we have to split
our model to two parts. While we need to share parameters between them
to save memory usage.

Solution:
1) add "lazy_init_input" in graph's attributes
   "attrs": {
     ... ...
     "lazy_init_input": [
       "list_str",
       [
         "p0"
       ]
     ]
    }
2) allow un-allocated NDArray entry in SetupStorage
3) utilize "set_input_zero_copy" function to set parameters

committed 5 years ago

224cc243 Browse Directory

02 Sep, 2019 1 commit
- [WIP][µTVM] Add OpenOCD Low-Level Device (RISC-V Support) (#3756) · 60de5be1
  Logan Weber committed 5 years ago
  
  60de5be1 Browse Directory
01 Sep, 2019 1 commit

[Relay][Any] Add shape func for dynamic shape (#3606) · eef35a57

* init shape func in interpreter and vm compiler

* Update interpreter

* fix

* lint

* lint

* fix

* remove hack

* update

* fix

* fix

* update

* address comments & update for shape_of

* fix lint

* update

* fix hybrid

* lint

* fix bug & add take shape func

* lint

* lint

* update

* fix flaky test

* add todo

committed 5 years ago

eef35a57 Browse Directory

29 Aug, 2019 1 commit
- [runtime] reduce set_input and set_input_zero_copy overhead (#3805) · 137bf5f4
  hlu1 committed 5 years ago
  
  137bf5f4 Browse Directory
21 Aug, 2019 1 commit

[Relay][VM]VM Profiler (#3727) · 95f12e31

* [Relay][VM]VM debugger

* Report mean/min/max for op duration

* Typos

* Lint

* Lint

* Lint

* Support build debug VM in CMake

* Lint

* Enable VM debug in unit test

* Disable debug vm test until new docker image is built

* Add device sync code

* Fix qnn unit test

* Disable vm debug by default

* Rename files

* Rename classes

* Fix comment

* Fix comment

committed 5 years ago

95f12e31 Browse Directory

01 Aug, 2019 1 commit

[Relay][VM] Support execution on devices (#3678) · 5357f49b

* [Relay][VM] Support execution on devices

* Reduce Copy calls

* Cleanup

* Lint

* CR comments

* Merge test into test_vm.py

committed 5 years ago

5357f49b Browse Directory

31 Jul, 2019 1 commit
- [Relay][VM] Relay VM serialization (#3647) · 90455121
```
* relay vm serialization

* fix lint

* load params, fix stream

* lint

* fix typo
```
  Zhi committed 5 years ago
  90455121 Browse Directory
30 Jul, 2019 2 commits
- ROCm: Add SaveToFile and LoadFile (#3665) · d4a51751
```
...and add rocm module_save to the tests.
```
  Thomas Viehmann committed 5 years ago
  d4a51751 Browse Directory
- Print llvm source by default in ROCMModuleNode::GetSource (#3662) · 52b63b9f
  Thomas Viehmann committed 5 years ago
  
  52b63b9f Browse Directory
25 Jul, 2019 2 commits

Implementation of uTVM (#3227) · ef909df1

* uTVM interfaces (#14)

* some minor interface changes

* implemented HostLowLevelDevice

* added MicroDeviceAPI

* implemented micro_common and added Python interfaces

* current status, semi implemented micro session

* added micro_common implementation and python interfaces (#18)

* added micro_common implementation and python interfaces (#18)

* current status, semi implemented

* host test working

* updated interfaces for MicroSession arguments allocation

* make somewhat lint compatible

* fix based on comments

* added rounding macro

* fix minor bug

* improvements based on comments

* Clean up `binutil.py` and make Python-3-compatible

* Change argument allocation design

* Address feedback and lint errors

* Improve binutil tests

* Simplify allocator (per @tqchen's suggestions)

* Doc/style fixes

* farts

* mcgee

* rodata section werks

(and so does `test_runtime_micro_workspace.py`)

* simple graph runtime werk

* TEMP

* ResNet works, yo

* First round of cleanup

* More cleanup

* runs a dyson over the code

* Another pass

* Fix `make lint` issues

* ready to pr... probably

* final

* Undo change

* Fix rebase resolution

* Minor fixes

* Undo changes to C codegen tests

* Add `obj_path` in `create_micro_lib`

* TEMP

* Address feedback

* Add missing TODO

* Partially address feedback

* Fix headers

* Switch to enum class for `SectionKind`

* Add missing ASF header

* Fix lint

* Fix lint again

* Fix lint

* Kill lint warnings

* Address feedback

* Change Python interface to MicroTVM

All interaction with the device is now through `Session` objects, which
are used through Python's `with` blocks.

* Reorder LowLevelDevice interface

* Store shared ptr to session in all alloced objects

* Move helper functions out of `tvm.micro`

* Switch static char arr to vector

* Improve general infra and code quality

Does not yet address all of tqchen's feedback

* Forgot a rename

* Fix lint

* Add ASF header

* Fix lint

* Partially address MarisaKirisame's feedback

* Lint

* Expose `MicroSession` as a node to Python

* Revert to using `Session` constructor

* Fix compiler error

* (Maybe) fix CI error

* Debugging

* Remove

* Quell lint

* Switch to stack-based session contexts

* Make uTVM less intrusive to host codegen

And use SSA for operands of generated ternary operators

* Inline UTVMArgs into UTVMTask struct

* Remove `HostLowLevelDevice` header

* Remove `BaseAddr` class

* Address feedback

* Add "utvm" prefix to global vars in runtime

* Fix lint

* Fix CI

* Fix `test_binutil.py`

* Fix submodules

* Remove ResNet tests

* Make `test_binutil.py` work with nose

* Fix CI

* I swear this actually fixes the binutil tests

* lint

* lint

* Add fcompile-compatible cross-compile func

* Add docs for uTVM runtime files

* Move pointer patching into `MicroSession`

* Fix lint

* First attempt at unifying cross-compile APIs

* Fix lint

* Rename `cross_compile` back to `cc`

* Address feedback

* Remove commented code

* Lint

* Figure out failing function

* Remove debugging code

* Change "micro_dev" target to "micro"

* Add checks in tests for whether uTVM is enabled

* Add TODO for 32-bit support

* Rename more "micro_dev" to "micro"

* Undo rename

We already have `tvm.micro` as a namespace.  Can't have it as a method
as well.

* Fix failing CI

Thanks to @tqchen for finding this bug.  Emitting ternary operators for
`min` and `max` causes concurrency bugs in CUDA, so we're moving the
ternary op emissions from `CodeGenC` to `CodeGenCHost`.

* Address feedback

* Fix lint

committed 5 years ago

ef909df1 Browse Directory

Add a missing header in cuda_device_api.cc (#3621) · 443d023b
Philip Hyunsu Cho committed 5 years ago

443d023b Browse Directory

23 Jul, 2019 1 commit

[Runtime] [ThreadPool] Make SpscTaskQueue::Pop(..) spin_count configurable (#3577) · 9b1c2e08

In cases where we have multiple models or threadpools active, spinning around
`sched_yield()` may not be desirable, as it prevents the OS from effectively
scheduling other threads.

Thus, allow users to conditionally disable this behaviour (via an environment
variable `TVM_THREAD_POOL_SPIN_COUNT`, similar to existing environment flags for
the thread pool such as `TVM_BIND_THREADS`, etc).

This substantially improves tail latencies in some of our multi-tenant
workloads in practice.

Unit tests have been added - on my laptop, running:

```
TVM_THREAD_POOL_SPIN_COUNT=0 ./build/threading_backend_test;
TVM_THREAD_POOL_SPIN_COUNT=1 ./build/threading_backend_test;
./build/threading_backend_test;
```

gives https://gist.github.com/ajtulloch/1805ca6cbaa27f5d442d23f9d0021ce6 (i.e.
97ms -> <1ms after this change)

committed 5 years ago

9b1c2e08 Browse Directory

16 Jul, 2019 1 commit

[Relay][VM] Port VM, VM compiler, and Object into python (#3391) · b6dc7826

* tmp

* Port vm and object to python

* clean up

* update vm build module

* update

* x

* tweak

* cleanup

* update

* fix rebase

* Rename to VMCompiler

* fix

committed 5 years ago

b6dc7826 Browse Directory

15 Jul, 2019 1 commit

[Runtime] Enable set_input_zero_copy in GraphRuntime (#3416) · afd4b3e4

* Enable set_input_zero_copy in GraphRuntime

* Fix LoadParams

* Fix

* lint

* Fix remote context issue

* Fix

* Remove LOG

* Remove unused variables

* Add tests

* works

* More test scenarios

* make it simpler

* Remove unnecessary changes

* Address comments

* More comments

* Address comments

* Fix build

committed 5 years ago

afd4b3e4 Browse Directory