To keep the runtime minimum, we isolated the IR Node support from the deployment runtime. The resulting runtime takes around 200K - 600K depending on how many runtime driver modules (e.g., CUDA) get included.
To keep the runtime minimum, we isolated the IR Object support from the deployment runtime. The resulting runtime takes around 200K - 600K depending on how many runtime driver modules (e.g., CUDA) get included.
The overhead of calling into PackedFunc vs. a normal function is small, as it is only saving a few values on the stack.
The overhead of calling into PackedFunc vs. a normal function is small, as it is only saving a few values on the stack.
So it is OK as long as we don't wrap small functions.
So it is OK as long as we don't wrap small functions.
...
@@ -182,7 +182,7 @@ RPC server on iPhone/android/raspberry pi or even the browser. The cross compila
...
@@ -182,7 +182,7 @@ RPC server on iPhone/android/raspberry pi or even the browser. The cross compila
This instant feedback gives us a lot of advantages. For example, to test the correctness of generated code on iPhone, we no longer have to write test-cases in swift/objective-c from scratch -- We can use RPC to execute on iPhone, copy the result back and do verification on the host via numpy. We can also do the profiling using the same script.
This instant feedback gives us a lot of advantages. For example, to test the correctness of generated code on iPhone, we no longer have to write test-cases in swift/objective-c from scratch -- We can use RPC to execute on iPhone, copy the result back and do verification on the host via numpy. We can also do the profiling using the same script.
TVM Node and Compiler Stack
TVM Object and Compiler Stack
---------------------------
---------------------------
As we mentioned earlier, we build compiler stack API on top of the PackedFunc runtime system.
As we mentioned earlier, we build compiler stack API on top of the PackedFunc runtime system.
...
@@ -192,17 +192,17 @@ However, we don't want to change our API from time to time. Besides that, we als
...
@@ -192,17 +192,17 @@ However, we don't want to change our API from time to time. Besides that, we als
- be able to serialize any language object and IRs
- be able to serialize any language object and IRs
- be able to explore, print, and manipulate the IR objects in front-end language to do quick prototyping.
- be able to explore, print, and manipulate the IR objects in front-end language to do quick prototyping.
We introduced a base class, called `Node`_ to solve this problem.
We introduced a base class, called `Object`_ to solve this problem.
All the language object in the compiler stack is a subclass of Node. Each node contains a string type_key that uniquely identifies
All the language object in the compiler stack is a subclass of ``Object``. Each object contains a string type_key that uniquely identifies
the type of object. We choose string instead of int as type key so new Node class can be added in the decentralized fashion without
the type of object. We choose string instead of int as type key so new ``Object`` class can be added in the decentralized fashion without
adding the code back to the central repo. To ease the speed of dispatching, we allocate an integer type_index at runtime for each type_key.
adding the code back to the central repo. To ease the speed of dispatching, we allocate an integer type_index at runtime for each type_key.