A critical piece of a VM is the dispatch loop. The dispatch loop usually dominates the execution time of a
virtual machine, but we have experimentally found this not to be the case for Relay. We have just implemented
a simple `switch`/`goto` dispatch loop which dispatches based on instruction op code.
a simple ``switch``/``goto`` dispatch loop which dispatches based on instruction op code.
This loop is implemented by `VirtualMachine::Run()`.
This loop is implemented by ``VirtualMachine::Run()``.
VM Compiler
~~~~~~~~~~~
An important part of this infrastructure is a compiler from Relay's full IR into a sequence of bytecode.
The VM compiler transforms a `tvm::relay::Module` into a `tvm::relay::vm::VirtualMachine`. The virtual
machine contains a set of compiled functions, the compiled functions are contained in `tvm::relay::vm::Function`. The functions contain metadata about the the function as well as its compiled bytecode. For full definitions of the data structures see `vm.h`.
The VM compiler transforms a ``tvm::relay::Module`` into a ``tvm::relay::vm::Executable``. The executable
contains a set of compiled functions, the compiled functions are contained in ``tvm::relay::vm::Function``. The functions contain metadata about the the function as well as its compiled bytecode. The emitted executable object then can be loaded and run by a ``tvm::relay::vm::VirtualMachine`` object. For full definitions of the data structures, please see `include/tvm/runtime/vm.h`_.
Optimizations
~~~~~~~~~~~~~
There are quite a few optimizations required by the VM compiler.
We have implemented them in the old pass style, but plan to port them to
the new pass manager (#2546) before merging.
There are quite a few optimizations required by the VM compiler. Each of them
is implemented as a pass which is managed by the Relay pass manager.
Optimizations marked with `TODO` are not implemented yet.
- A-Normal Form
- Lambda Lift (see `src/relay/vm/lambda_lift.cc`)
- Inline Primitives (see `src/relay/vm/inline_primitives.cc`)
- Inliner (see `src/relay/pass/inliner.cc`)
- Constant Pool Layout (see `src/relay/backend/vm/compiler.cc`)
- ADT Tag Allocation (see `src/relay/backend/vm/compiler.cc`)
- Lambda Lift (see `src/relay/vm/lambda_lift.cc`_)
- Inline Primitives (see `src/relay/vm/inline_primitives.cc`_)
- Constant Pool Layout (see `src/relay/backend/vm/compiler.cc`_)
- ADT Tag Allocation (see `src/relay/backend/vm/compiler.cc`_)
Serializing and deserializing the executable generated by the Relay VM compiler is a must as
we may want to save the model to the disk and perform inference later. Previously, Relay has produced
a serialized form in a json file for the graph runtime. However, the same format is not directly
applicable to the VM as it emits bytecode instead of graph-style programs.
Serialization of an executable essentially needs to handle both model specific
(i.e. weights and kernels) and VM related (i.e. bytecode and global function names) data.
For kernels, we can conveniently leverage existing TVM infra to save and load
the compiled library module. Here we only focus on serializing other several
components in a binary format that is organized with the following sections in order.
- Global section. This section contains the globals (function names) used by the virtual machine.
- Constant section. This section is used to store the constant pool (i.e. weights of the model)
for a virtual machine.
- Primitive name section. This section is introduced to accommodate the list of primitive
operator names that will be invoked by the virtual machine, i.e. the names
starting with ``fused_``. The primitive names are used as symbols to look up
function pointers in the compiled kernel library.
- Code section. The VM functions, including bytecode, are sitting in this section. The dispatching
loop iterates through this section to fetch instructions for execution.
Hence, unlike the graph runtime artifact that contains weight (.params), graph json (.json),
and compiled kernel library (.so), the serialized executable artifact is composed of the Relay
object file (.ro) and the compiled kernel library (.so).
A ``save`` function is implemented to store the executable to the disk and
serialize it into the above format. Meanwhile, a ``load_exec`` function is used to
load the serialized kernel binary and executable related binary code, which will be again used to
instantiate a VM object. Please refer to the `test_vm_serialization.py`_ file for more
examples.
A final and yet-to-be-implemented part of the VM design is serialization. The accompanying PR will introduce both the bytecode and its serialization, as well as VM-level serialization. The design premise is that a VM can be efficiently stored to disk and resumed at a later time. This would also allow us to efficiently schedule many models on to a single machine in order to obtain good utilization.