Commit c8e17dd2 by Zhi Chen Committed by Wei Chen

fix vm doc

parent 8d945872
...@@ -94,57 +94,83 @@ Ret ...@@ -94,57 +94,83 @@ Ret
RegName dst RegName dst
RegName result RegName result
Returns the object in register `result` to caller's register `dst`. Returns the object in register ``result`` to caller's register ``dst``.
InvokePacked InvokePacked
^^^^^^^^^^^^ ^^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
size_t packed_index Index packed_index
size_t arity Index arity
size_t output_size Index output_size
RegName* packed_args RegName* packed_args
Invoke the packed function denoted by `packed_index`. The `arity` Invoke the packed function denoted by ``packed_index``. The ``arity``
and `output_size` are used to inform the VM how many inputs and and ``output_size`` are used to inform the VM how many inputs and
outputs to expect. `packed_args` stores the list of argument registers. outputs to expect. ``packed_args`` stores the list of argument registers. Note ``Index``
is an alais of ``int64_t``, and it will be used in other instructions as well.
AllocTensor AllocTensor
^^^^^^^^^^^ ^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
RegName storage
uint32_t ndim
int64_t* shape
DLDataType dtype
Allocate a tensor value of using constant shape (stored in ``shape``) and ``dtype``
from the given storage block, ``storage``. The result is saved to register ``dst``.
AllocTensorReg
^^^^^^^^^^^^^^
**Arguments**:
::
RegName dst
RegName storage
RegName shape_register RegName shape_register
size_t ndim
DLDataType dtype DLDataType dtype
Allocate a tensor value of the appropriate shape (stored in `shape_register`) and `dtype`. The result Allocate a tensor value of the appropriate shape (stored in ``shape_register``)
is saved to register `dst`. and ``dtype`` from the given storage block (stored in ``storage``). The result is saved to register ``dst``.
AllocStorage
^^^^^^^^^^^^
**Arguments**:
::
RegName dst
RegName size
RegName alignment
DLDataType dtype_hint
Allocate a storage block with the given ``size``, ``alignment`` and and data type, ``dtype_hint``.
The allocated storage block is stored in register ``dst``.
AllocADT AllocADT
^^^^^^^^^^^^^ ^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
size_t tag Index tag
size_t num_fields Index num_fields
RegName* datatype_fields RegName* datatype_fields
Allocate a data type with the tag `tag` using the `num_fields` entries Allocate a data type with the tag ``tag`` using the ``num_fields`` entries
from registers `datatype_fields`. The result is saved to register `dst`. from registers ``datatype_fields``. The result is saved to register ``dst``.
AllocClosure AllocClosure
^^^^^^^^^^^^ ^^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
size_t clo_index Index clo_index
size_t num_freevar Index num_freevar
RegName* free_vars; RegName* free_vars;
Allocate a closure with the VMFunction at `clo_index` as Allocate a closure with the VMFunction at ``clo_index`` as
its code, and the `num_freevar` entries from registers in its code, and the ``num_freevar`` entries from registers in
`free_vars`. The result is saved to register `dst`. ``free_vars``. The result is saved to register ``dst``.
GetField GetField
^^^^^^^^ ^^^^^^^^
...@@ -152,9 +178,9 @@ GetField ...@@ -152,9 +178,9 @@ GetField
:: ::
RegName dst RegName dst
RegName object RegName object
size_t field_index Index field_index
Get the field value with index `field_index` from `object`. And saves the result to register `dst`. Get the field value with index ``field_index`` from ``object``. And saves the result to register ``dst``.
If If
^^ ^^
...@@ -162,21 +188,21 @@ If ...@@ -162,21 +188,21 @@ If
:: ::
RegName test RegName test
RegName target RegName target
size_t true_offset Index true_offset
size_t false_offset Index false_offset
Check if the object at register `test` is equal to `target`. Check if the object at register ``test`` is equal to ``target``.
If equal, relative jump by `true_offset`, else relative If equal, relative jump by ``true_offset``, else relative
jump by `false_offset`. jump by ``false_offset``.
GetTagi GetTag
^^^^^^^ ^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName object RegName object
RegName dst RegName dst
Get the object tag for ADT object in register `object`. And saves the reult to register `dst`. Get the object tag for ADT object in register ``object``. And saves the reult to register ``dst``.
Fatal Fatal
^^^^^ ^^^^^
...@@ -186,17 +212,17 @@ Goto ...@@ -186,17 +212,17 @@ Goto
^^^^ ^^^^
**Arguments**: **Arguments**:
:: ::
size_t pc_offset Index pc_offset
Relative unconditional jump by `pc_offset`. Relative unconditional jump by ``pc_offset``.
Invoke Invoke
^^^^^^ ^^^^^^
**Arguments**: **Arguments**:
:: ::
size_t func_index Index func_index
Invoke function at `func_index`, consumes the number of arguments contained in the VMFunction's Invoke function at ``func_index``, consumes the number of arguments contained in the VMFunction's
arity field. arity field.
InvokeClosure InvokeClosure
...@@ -204,57 +230,44 @@ InvokeClosure ...@@ -204,57 +230,44 @@ InvokeClosure
**Arguments**: **Arguments**:
:: ::
RegName closure RegName closure
size_t num_closure_args Index num_closure_args
RegName* closure_args RegName* closure_args
Invokes `closure`, consuming the number of arguments declared in the closure's VMFunction. Invokes ``closure``, consuming the number of arguments declared in the closure's VMFunction.
LoadConst LoadConst
^^^^^^^^^ ^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
size_t const_index Index const_index
Load the constant at `const_index` from the constant pool. The result is saved to register `dst`. Load the constant at ``const_index`` from the constant pool. The result is saved to register ``dst``.
LoadConsti LoadConsti
^^^^^^^^^^ ^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
size_t val Index val
RegName dst RegName dst
Load the constant integer `val` to register `dst`. The result is a 0-rank tensor. Load the constant integer ``val`` to register ``dst``. The result is a 0-rank tensor.
Object Representation Object Representation
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
We use a simple object representation that uses shared pointers and tagging. We leverage the object protocol to represent the objects that are used by the
There is a huge space of possible object representations trade-offs, but we VM.
believe micro-optimizing this code has little to no effect on the end-to-end performance.
:: Currently, three types of objects, ``NDArray``, ``ADT``, and ``Closure`` objects, are used
to represent tensor, tuple/list, and closure data, respectively. More details
for each of them can be found at `include/tvm/runtime/ndarray.h`_,
`include/tvm/runtime/vm.h`_, and `include/tvm/runtime/container.h`_, respectively.
struct ObjectCell { .. _include/tvm/runtime/ndarray.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/runtime/ndarray.h
ObjectTag tag;
...
};
struct Object { .. _include/tvm/runtime/vm.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/runtime/vm.h
std::shared_ptr<ObjectCell> ptr;
...
}
See `include/tvm/runtime/vm.h` for more details.
Currently, we support 3 types of objects: tensors, data types, and closures.
::
Object Tensor(const tvm::runtime::NDArray& data);
Object ADT(size_t tag, const std::vector<Object>& fields);
Object Closure(size_t func_index, std::vector<Object> free_vars);
.. _include/tvm/runtime/container.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/runtime/container.h
Stack and State Stack and State
~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
...@@ -284,40 +297,76 @@ Dispatch Loop ...@@ -284,40 +297,76 @@ Dispatch Loop
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
A critical piece of a VM is the dispatch loop. The dispatch loop usually dominates the execution time of a A critical piece of a VM is the dispatch loop. The dispatch loop usually dominates the execution time of a
virtual machine, but we have experimentally found this not to be the case for Relay. We have just implemented virtual machine, but we have experimentally found this not to be the case for Relay. We have just implemented
a simple `switch`/`goto` dispatch loop which dispatches based on instruction op code. a simple ``switch``/``goto`` dispatch loop which dispatches based on instruction op code.
This loop is implemented by `VirtualMachine::Run()`. This loop is implemented by ``VirtualMachine::Run()``.
VM Compiler VM Compiler
~~~~~~~~~~~ ~~~~~~~~~~~
An important part of this infrastructure is a compiler from Relay's full IR into a sequence of bytecode. An important part of this infrastructure is a compiler from Relay's full IR into a sequence of bytecode.
The VM compiler transforms a `tvm::relay::Module` into a `tvm::relay::vm::VirtualMachine`. The virtual The VM compiler transforms a ``tvm::relay::Module`` into a ``tvm::relay::vm::Executable``. The executable
machine contains a set of compiled functions, the compiled functions are contained in `tvm::relay::vm::Function`. The functions contain metadata about the the function as well as its compiled bytecode. For full definitions of the data structures see `vm.h`. contains a set of compiled functions, the compiled functions are contained in ``tvm::relay::vm::Function``. The functions contain metadata about the the function as well as its compiled bytecode. The emitted executable object then can be loaded and run by a ``tvm::relay::vm::VirtualMachine`` object. For full definitions of the data structures, please see `include/tvm/runtime/vm.h`_.
Optimizations Optimizations
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
There are quite a few optimizations required by the VM compiler. There are quite a few optimizations required by the VM compiler. Each of them
is implemented as a pass which is managed by the Relay pass manager.
We have implemented them in the old pass style, but plan to port them to
the new pass manager (#2546) before merging.
Optimizations marked with `TODO` are not implemented yet. Optimizations marked with `TODO` are not implemented yet.
- A-Normal Form - A-Normal Form
- Lambda Lift (see `src/relay/vm/lambda_lift.cc`) - Lambda Lift (see `src/relay/vm/lambda_lift.cc`_)
- Inline Primitives (see `src/relay/vm/inline_primitives.cc`) - Inline Primitives (see `src/relay/vm/inline_primitives.cc`_)
- Inliner (see `src/relay/pass/inliner.cc`) - Constant Pool Layout (see `src/relay/backend/vm/compiler.cc`_)
- Constant Pool Layout (see `src/relay/backend/vm/compiler.cc`) - ADT Tag Allocation (see `src/relay/backend/vm/compiler.cc`_)
- ADT Tag Allocation (see `src/relay/backend/vm/compiler.cc`)
- Tail Call Optimization (TODO) - Tail Call Optimization (TODO)
- Liveness Analysis (TODO) - Liveness Analysis (TODO)
.. _src/relay/vm/lambda_lift.cc: https://github.com/apache/incubator-tvm/blob/master/src/relay/backend/vm/lambda_lift.cc
.. _src/relay/vm/inline_primitives.cc: https://github.com/apache/incubator-tvm/blob/master/src/relay/backend/vm/inline_primitives.cc
.. _src/relay/backend/vm/compiler.cc: https://github.com/apache/incubator-tvm/blob/master/src/relay/backend/vm/compiler.cc
Serialization Serialization
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
Serializing and deserializing the executable generated by the Relay VM compiler is a must as
we may want to save the model to the disk and perform inference later. Previously, Relay has produced
a serialized form in a json file for the graph runtime. However, the same format is not directly
applicable to the VM as it emits bytecode instead of graph-style programs.
Serialization of an executable essentially needs to handle both model specific
(i.e. weights and kernels) and VM related (i.e. bytecode and global function names) data.
For kernels, we can conveniently leverage existing TVM infra to save and load
the compiled library module. Here we only focus on serializing other several
components in a binary format that is organized with the following sections in order.
- Global section. This section contains the globals (function names) used by the virtual machine.
- Constant section. This section is used to store the constant pool (i.e. weights of the model)
for a virtual machine.
- Primitive name section. This section is introduced to accommodate the list of primitive
operator names that will be invoked by the virtual machine, i.e. the names
starting with ``fused_``. The primitive names are used as symbols to look up
function pointers in the compiled kernel library.
- Code section. The VM functions, including bytecode, are sitting in this section. The dispatching
loop iterates through this section to fetch instructions for execution.
Hence, unlike the graph runtime artifact that contains weight (.params), graph json (.json),
and compiled kernel library (.so), the serialized executable artifact is composed of the Relay
object file (.ro) and the compiled kernel library (.so).
A ``save`` function is implemented to store the executable to the disk and
serialize it into the above format. Meanwhile, a ``load_exec`` function is used to
load the serialized kernel binary and executable related binary code, which will be again used to
instantiate a VM object. Please refer to the `test_vm_serialization.py`_ file for more
examples.
A final and yet-to-be-implemented part of the VM design is serialization. The accompanying PR will introduce both the bytecode and its serialization, as well as VM-level serialization. The design premise is that a VM can be efficiently stored to disk and resumed at a later time. This would also allow us to efficiently schedule many models on to a single machine in order to obtain good utilization. .. _test_vm_serialization.py: https://github.com/apache/incubator-tvm/blob/master/tests/python/relay/test_vm_serialization.py
Unresolved Questions Unresolved Questions
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment