Unverified Commit 98e7709f by Cody Yu Committed by GitHub

[DOCS] Fix Sphinx Warnings (RST indent, cross-ref, and image scale) (#4920)

* fix indents

* Fix image scale and cross-ref
parent efd35e86
...@@ -21,6 +21,7 @@ Python API ...@@ -21,6 +21,7 @@ Python API
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
tvm
runtime runtime
ndarray ndarray
error error
......
...@@ -53,8 +53,3 @@ tvm.relay.op ...@@ -53,8 +53,3 @@ tvm.relay.op
.. automodule:: tvm.relay.op.nn .. automodule:: tvm.relay.op.nn
:members: :members:
.. automodule:: tvm.relay.op.vision.multibox
:members:
.. autofunction:: tvm.relay.vision.nms
...@@ -27,7 +27,7 @@ tvm.runtime ...@@ -27,7 +27,7 @@ tvm.runtime
.. autoclass:: tvm.runtime.PackedFunc .. autoclass:: tvm.runtime.PackedFunc
:members: :members:
:inheritated-members: :inherited-members:
.. autofunction:: tvm.register_func .. autofunction:: tvm.register_func
......
...@@ -29,12 +29,11 @@ This is a quick guide to submit a pull request, please also refer to the detaile ...@@ -29,12 +29,11 @@ This is a quick guide to submit a pull request, please also refer to the detaile
git rebase upstream/master git rebase upstream/master
- Make sure code style check pass by typing the following command, and all the existing test-cases pass. - Make sure code style check pass by typing the following command, and all the existing test-cases pass.
- ``docker/bash.sh tvmai/ci-lint ./tests/scripts/task_lint.sh`` - ``docker/bash.sh tvmai/ci-lint ./tests/scripts/task_lint.sh``. (Note: You must install docker beforehand so you can run a docker image.)
(Note: You must install docker beforehand so you can run a docker image.)
- Add test-cases to cover the new features or bugfix the patch introduces. - Add test-cases to cover the new features or bugfix the patch introduces.
- Document the code you wrote, see more at :ref:`doc_guide` - Document the code you wrote, see more at :ref:`doc_guide`
- Send the pull request, fix the problems reported by automatic checks. - Send the pull request and fix the problems reported by automatic checks.
Request code reviews from other contributors and improves your patch according to feedbacks. - Request code reviews from other contributors and improves your patch according to feedbacks.
- To get your code reviewed quickly, we encourage you to help review others' code so they can do the favor in return. - To get your code reviewed quickly, we encourage you to help review others' code so they can do the favor in return.
- Code review is a shepherding process that helps to improve contributor's code quality. - Code review is a shepherding process that helps to improve contributor's code quality.
......
...@@ -56,7 +56,6 @@ embedded devices is through TVM's RPC API. ...@@ -56,7 +56,6 @@ embedded devices is through TVM's RPC API.
Here are the links to the related tutorials. Here are the links to the related tutorials.
- :ref:`tutorial-cross-compilation-and-rpc` - :ref:`tutorial-cross-compilation-and-rpc`
- :ref:`tutorial-deploy-model-on-mali-gpu`
- :ref:`tutorial-deploy-model-on-rasp` - :ref:`tutorial-deploy-model-on-rasp`
After you finished tuning and benchmarking, you might need to deploy the model on the After you finished tuning and benchmarking, you might need to deploy the model on the
...@@ -68,3 +67,5 @@ target device without relying on RPC. see the following resources on how to do s ...@@ -68,3 +67,5 @@ target device without relying on RPC. see the following resources on how to do s
cpp_deploy cpp_deploy
android android
integrate integrate
aocl_fpga
aws_fpga
...@@ -118,13 +118,11 @@ In the Operation class declaration above, we can see that each operation also ha ...@@ -118,13 +118,11 @@ In the Operation class declaration above, we can see that each operation also ha
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/stage_graph.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/stage_graph.png
:align: center :align: center
:scale: 70%
InferBound makes one pass through the graph, visiting each stage exactly once. InferBound starts from the output stages (i.e., the solid blue nodes in the graph above), and moves upwards (in the opposite direction of the edges). This is achieved by performing a reverse topological sort on the nodes of the graph. Therefore, when InferBound visits a stage, each of its consumer stages has already been visited. InferBound makes one pass through the graph, visiting each stage exactly once. InferBound starts from the output stages (i.e., the solid blue nodes in the graph above), and moves upwards (in the opposite direction of the edges). This is achieved by performing a reverse topological sort on the nodes of the graph. Therefore, when InferBound visits a stage, each of its consumer stages has already been visited.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/inferbound_traversal.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/inferbound_traversal.png
:align: center :align: center
:scale: 70%
The InferBound pass is shown in the following pseudo-code: The InferBound pass is shown in the following pseudo-code:
...@@ -162,7 +160,6 @@ Recall that all IterVars of the stage are related by IterVarRelations. The IterV ...@@ -162,7 +160,6 @@ Recall that all IterVars of the stage are related by IterVarRelations. The IterV
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/relations.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/relations.png
:align: center :align: center
:scale: 70%
The above diagram shows the IterVar hyper-graph for one stage. The stage has one root_iter_var, ``i``. It has been split, and the resulting inner axis ``i.inner``, has been split again. The leaf_iter_vars of the stage are shown in green: ``i.outer``, ``i.inner.outer``, and ``i.inner.inner``. The above diagram shows the IterVar hyper-graph for one stage. The stage has one root_iter_var, ``i``. It has been split, and the resulting inner axis ``i.inner``, has been split again. The leaf_iter_vars of the stage are shown in green: ``i.outer``, ``i.inner.outer``, and ``i.inner.inner``.
...@@ -208,7 +205,6 @@ As mentioned above, a consumer may only require a small number of elements from ...@@ -208,7 +205,6 @@ As mentioned above, a consumer may only require a small number of elements from
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/inferbound_phases.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/inferbound_phases.png
:align: center :align: center
:scale: 70%
IntSets IntSets
~~~~~~~ ~~~~~~~
...@@ -323,14 +319,12 @@ A ComputeOp has only a single output Tensor, whose axes correspond to the axis v ...@@ -323,14 +319,12 @@ A ComputeOp has only a single output Tensor, whose axes correspond to the axis v
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/gatherbound.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/gatherbound.png
:align: center :align: center
:scale: 70%
The union of IntSets is computed by converting each IntSet to an Interval, and then taking the minimum of all minimums, and the maximum of all of these interval's maximums. The union of IntSets is computed by converting each IntSet to an Interval, and then taking the minimum of all minimums, and the maximum of all of these interval's maximums.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/union.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/union.png
:align: center :align: center
:scale: 70%
This clearly results in some unnecessary computation, i.e., tensor elements will be computed that are never used. This clearly results in some unnecessary computation, i.e., tensor elements will be computed that are never used.
...@@ -340,7 +334,6 @@ Unfortunately, even if we're lucky and the IntervalSet unions do not produce unn ...@@ -340,7 +334,6 @@ Unfortunately, even if we're lucky and the IntervalSet unions do not produce unn
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/gatherbound_problem.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/gatherbound_problem.png
:align: center :align: center
:scale: 70%
.. _InferBoundCA: .. _InferBoundCA:
...@@ -696,7 +689,6 @@ When InferRootBound is working on stage B, it visits B's consumer stage C to fin ...@@ -696,7 +689,6 @@ When InferRootBound is working on stage B, it visits B's consumer stage C to fin
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_problem.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_problem.png
:align: center :align: center
:scale: 70%
...@@ -756,17 +748,14 @@ If the split factor is 4, or 8, in the above example, the region of B needed in ...@@ -756,17 +748,14 @@ If the split factor is 4, or 8, in the above example, the region of B needed in
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_div.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_div.png
:align: center :align: center
:scale: 70%
However, if the split factor is changed from 4 to 3 in the example above, it is easy to see that the region of B that C needs can no longer be described by an independent Range for each of its axes. However, if the split factor is changed from 4 to 3 in the example above, it is easy to see that the region of B that C needs can no longer be described by an independent Range for each of its axes.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_nodiv.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_nodiv.png
:align: center :align: center
:scale: 70%
The best that can be done with rectangular regions is shown in the following diagram. The orange regions are the minimum rectangular regions covering the region of B that needs to be computed, at each iteration of the outer loop. The best that can be done with rectangular regions is shown in the following diagram. The orange regions are the minimum rectangular regions covering the region of B that needs to be computed, at each iteration of the outer loop.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_min.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/docs/inferbound/passupdomain_min.png
:align: center :align: center
:scale: 70%
...@@ -535,7 +535,7 @@ To simplify, we define a graph representation named "ExampleJSON" in this guide. ...@@ -535,7 +535,7 @@ To simplify, we define a graph representation named "ExampleJSON" in this guide.
Then the ExampleJON of this subgraph looks like: Then the ExampleJON of this subgraph looks like:
.. code-block:: json .. code-block:: none
subgraph_0 subgraph_0
input 0 10 10 input 0 10 10
...@@ -544,7 +544,7 @@ Then the ExampleJON of this subgraph looks like: ...@@ -544,7 +544,7 @@ Then the ExampleJON of this subgraph looks like:
input 3 10 10 input 3 10 10
add 4 inputs: 0 1 shape: 10 10 add 4 inputs: 0 1 shape: 10 10
sub 5 inputs: 4 2 shape: 10 10 sub 5 inputs: 4 2 shape: 10 10
add 6 inputs: 5 3 shape: 10 10 mul 6 inputs: 5 3 shape: 10 10
The ``input`` keyword declares an input tensor with its ID and shape; while the other statements describes computations in ``<op> <output ID> inputs: [input ID] shape: [shape]`` syntax. The ``input`` keyword declares an input tensor with its ID and shape; while the other statements describes computations in ``<op> <output ID> inputs: [input ID] shape: [shape]`` syntax.
......
...@@ -39,7 +39,6 @@ compile for heterogeneous execution environments (e.g., executing parts of the g ...@@ -39,7 +39,6 @@ compile for heterogeneous execution environments (e.g., executing parts of the g
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png
:align: center :align: center
:scale: 70%
You can use Relay to build a computational (dataflow) graph. Specifically, the above code shows how to You can use Relay to build a computational (dataflow) graph. Specifically, the above code shows how to
...@@ -130,7 +129,6 @@ The code example below shows one program with two forms side by side. ...@@ -130,7 +129,6 @@ The code example below shows one program with two forms side by side.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png
:align: center :align: center
:scale: 70%
The nested let binding is called A-normal form, and it is commonly used as IRs in functional programming languages. The nested let binding is called A-normal form, and it is commonly used as IRs in functional programming languages.
...@@ -155,7 +153,6 @@ which does not use let bindings. ...@@ -155,7 +153,6 @@ which does not use let bindings.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png
:align: center :align: center
:scale: 70%
The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems The problem comes when we try to decide where we should evaluate node ``%1``. In particular, while the text format seems
to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as shown in the picture) does not suggest so. to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as shown in the picture) does not suggest so.
......
...@@ -258,6 +258,7 @@ It also allows us to get members of an object easily in front-end language. ...@@ -258,6 +258,7 @@ It also allows us to get members of an object easily in front-end language.
For example, in the following code, we accessed the op field of the TensorNode. For example, in the following code, we accessed the op field of the TensorNode.
.. code:: python .. code:: python
import tvm import tvm
x = tvm.placeholder((3,4), name="x") x = tvm.placeholder((3,4), name="x")
......
...@@ -91,6 +91,7 @@ Ret ...@@ -91,6 +91,7 @@ Ret
^^^ ^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
RegName result RegName result
...@@ -100,6 +101,7 @@ InvokePacked ...@@ -100,6 +101,7 @@ InvokePacked
^^^^^^^^^^^^ ^^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
Index packed_index Index packed_index
Index arity Index arity
Index output_size Index output_size
...@@ -114,6 +116,7 @@ AllocTensor ...@@ -114,6 +116,7 @@ AllocTensor
^^^^^^^^^^^ ^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
RegName storage RegName storage
uint32_t ndim uint32_t ndim
...@@ -127,6 +130,7 @@ AllocTensorReg ...@@ -127,6 +130,7 @@ AllocTensorReg
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
RegName storage RegName storage
RegName shape_register RegName shape_register
...@@ -139,6 +143,7 @@ AllocStorage ...@@ -139,6 +143,7 @@ AllocStorage
^^^^^^^^^^^^ ^^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
RegName size RegName size
RegName alignment RegName alignment
...@@ -151,6 +156,7 @@ AllocADT ...@@ -151,6 +156,7 @@ AllocADT
^^^^^^^^ ^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
Index tag Index tag
Index num_fields Index num_fields
...@@ -163,6 +169,7 @@ AllocClosure ...@@ -163,6 +169,7 @@ AllocClosure
^^^^^^^^^^^^ ^^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
Index clo_index Index clo_index
Index num_freevar Index num_freevar
...@@ -176,6 +183,7 @@ GetField ...@@ -176,6 +183,7 @@ GetField
^^^^^^^^ ^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
RegName object RegName object
Index field_index Index field_index
...@@ -186,6 +194,7 @@ If ...@@ -186,6 +194,7 @@ If
^^ ^^
**Arguments**: **Arguments**:
:: ::
RegName test RegName test
RegName target RegName target
Index true_offset Index true_offset
...@@ -199,6 +208,7 @@ GetTag ...@@ -199,6 +208,7 @@ GetTag
^^^^^^ ^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName object RegName object
RegName dst RegName dst
...@@ -212,6 +222,7 @@ Goto ...@@ -212,6 +222,7 @@ Goto
^^^^ ^^^^
**Arguments**: **Arguments**:
:: ::
Index pc_offset Index pc_offset
Relative unconditional jump by ``pc_offset``. Relative unconditional jump by ``pc_offset``.
...@@ -220,6 +231,7 @@ Invoke ...@@ -220,6 +231,7 @@ Invoke
^^^^^^ ^^^^^^
**Arguments**: **Arguments**:
:: ::
Index func_index Index func_index
Invoke function at ``func_index``, consumes the number of arguments contained in the VMFunction's Invoke function at ``func_index``, consumes the number of arguments contained in the VMFunction's
...@@ -229,6 +241,7 @@ InvokeClosure ...@@ -229,6 +241,7 @@ InvokeClosure
^^^^^^^^^^^^^ ^^^^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName closure RegName closure
Index num_closure_args Index num_closure_args
RegName* closure_args RegName* closure_args
...@@ -239,6 +252,7 @@ LoadConst ...@@ -239,6 +252,7 @@ LoadConst
^^^^^^^^^ ^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
RegName dst RegName dst
Index const_index Index const_index
...@@ -248,6 +262,7 @@ LoadConsti ...@@ -248,6 +262,7 @@ LoadConsti
^^^^^^^^^^ ^^^^^^^^^^
**Arguments**: **Arguments**:
:: ::
Index val Index val
RegName dst RegName dst
...@@ -277,7 +292,7 @@ previous call. Registers are allocated in a continuous space (virtual register f ...@@ -277,7 +292,7 @@ previous call. Registers are allocated in a continuous space (virtual register f
We keep track of a set of Relay functions we have called, a pointer into its bytecode, an offset into the byte code (known as the program counter). We keep track of a set of Relay functions we have called, a pointer into its bytecode, an offset into the byte code (known as the program counter).
:: .. code-block:: c
struct VirtualMachine { struct VirtualMachine {
... ...
...@@ -331,6 +346,7 @@ Optimizations marked with `TODO` are not implemented yet. ...@@ -331,6 +346,7 @@ Optimizations marked with `TODO` are not implemented yet.
Serialization Serialization
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
Serializing and deserializing the executable generated by the Relay VM compiler is a must as Serializing and deserializing the executable generated by the Relay VM compiler is a must as
we may want to save the model to the disk and perform inference later. Previously, Relay has produced we may want to save the model to the disk and perform inference later. Previously, Relay has produced
a serialized form in a json file for the graph runtime. However, the same format is not directly a serialized form in a json file for the graph runtime. However, the same format is not directly
...@@ -372,14 +388,17 @@ Unresolved Questions ...@@ -372,14 +388,17 @@ Unresolved Questions
How do we handle dynamic shapes? How do we handle dynamic shapes?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TODO TODO
How can we modify the VM to support JIT compilation of certain code paths? How can we modify the VM to support JIT compilation of certain code paths?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the code generation space there are still many tradeoffs to be analyzed and the VM is designed In the code generation space there are still many tradeoffs to be analyzed and the VM is designed
to be very flexible so we can modify it for future experiments. to be very flexible so we can modify it for future experiments.
How do we support heterogenous execution? How do we support heterogenous execution?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Heterogenous execution should work out of the box assuming we have annotated the appropriate device copies. Heterogenous execution should work out of the box assuming we have annotated the appropriate device copies.
In order to do this properly we need to run the device annotation and copying passes. In order to do this properly we need to run the device annotation and copying passes.
...@@ -215,6 +215,7 @@ This would result in a ``load-gemm-activate-store`` task pipeline which closely ...@@ -215,6 +215,7 @@ This would result in a ``load-gemm-activate-store`` task pipeline which closely
Adding more stages has a cost however: it can add storage and extra logic overhead, which is why we opted for a default 3-stage pipeline. Adding more stages has a cost however: it can add storage and extra logic overhead, which is why we opted for a default 3-stage pipeline.
.. _vta-uarch: .. _vta-uarch:
Microarchitectural Overview Microarchitectural Overview
--------------------------- ---------------------------
......
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
VTA Hardware Design Overview
============================
...@@ -16,32 +16,29 @@ ...@@ -16,32 +16,29 @@
# under the License. # under the License.
""" """
Building a Graph Convolutional Network Building a Graph Convolutional Network
===================== ======================================
**Author**: `Yulun Yao <https://yulunyao.io/>`_, \ **Author**: `Yulun Yao <https://yulunyao.io/>`_, \
`Chien-Yu Lin <https://homes.cs.washington.edu/~cyulin/>`_ `Chien-Yu Lin <https://homes.cs.washington.edu/~cyulin/>`_
This article is an introductory tutorial to build a Graph Convolutional Network (GCN) with Relay. This article is an introductory tutorial to build a Graph Convolutional Network (GCN) with Relay.
In this tutorial, we will run our GCN on Cora dataset to demonstrate. In this tutorial, we will run our GCN on Cora dataset to demonstrate.
Cora dataset is a common benchmark for Graph Neural Networks (GNN) and frameworks that support GNN training and inference. Cora dataset is a common benchmark for Graph Neural Networks (GNN) and frameworks that support GNN training and inference.
We directly load the dataset from DGL library to do the apples to apples comparison against DGL. We directly load the dataset from DGL library to do the apples to apples comparison against DGL.
Please refer to DGL doc for DGL installation at Please refer to DGL doc for DGL installation at
https://docs.dgl.ai/install/index.html https://docs.dgl.ai/install/index.html.
and refer to PyTorch guide for PyTorch installation at Please refer to PyTorch guide for PyTorch installation at
https://pytorch.org/get-started/locally/ https://pytorch.org/get-started/locally/.
""" """
###################################################################### ######################################################################
# Define GCN in DGL with PyTorch backend # Define GCN in DGL with PyTorch backend
# ------------------ # --------------------------------------
# #
# DGL example: https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcn # DGL example: https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcn
# This part reuses the code from the above example # This part reuses the code from the above example.
import torch import torch
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
...@@ -78,7 +75,7 @@ class GCN(nn.Module): ...@@ -78,7 +75,7 @@ class GCN(nn.Module):
###################################################################### ######################################################################
# Define the functions to load dataset and evaluate accuracy # Define the functions to load dataset and evaluate accuracy
# ------------------ # ----------------------------------------------------------
# You may substitute this part with your own dataset, here we load data from DGL # You may substitute this part with your own dataset, here we load data from DGL
from dgl.data import load_data from dgl.data import load_data
from collections import namedtuple from collections import namedtuple
...@@ -106,7 +103,7 @@ def evaluate(data, logits): ...@@ -106,7 +103,7 @@ def evaluate(data, logits):
###################################################################### ######################################################################
# Load the data and set up model parameters # Load the data and set up model parameters
# ------------------ # -----------------------------------------
""" """
Parameters Parameters
---------- ----------
...@@ -136,7 +133,7 @@ num_classes = data.num_labels ...@@ -136,7 +133,7 @@ num_classes = data.num_labels
###################################################################### ######################################################################
# Set up the DGL-PyTorch model and get the golden results # Set up the DGL-PyTorch model and get the golden results
# ------------------ # -------------------------------------------------------
# #
# The weights are trained with https://github.com/dmlc/dgl/blob/master/examples/pytorch/gcn/train.py # The weights are trained with https://github.com/dmlc/dgl/blob/master/examples/pytorch/gcn/train.py
from tvm.contrib.download import download_testdata from tvm.contrib.download import download_testdata
...@@ -162,7 +159,7 @@ torch_model.load_state_dict(torch.load(model_path)) ...@@ -162,7 +159,7 @@ torch_model.load_state_dict(torch.load(model_path))
###################################################################### ######################################################################
# Run the DGL model and test for accuracy # Run the DGL model and test for accuracy
# ------------------ # ---------------------------------------
torch_model.eval() torch_model.eval()
with torch.no_grad(): with torch.no_grad():
logits_torch = torch_model(features) logits_torch = torch_model(features)
...@@ -174,9 +171,8 @@ print("Test accuracy of DGL results: {:.2%}".format(acc)) ...@@ -174,9 +171,8 @@ print("Test accuracy of DGL results: {:.2%}".format(acc))
###################################################################### ######################################################################
# Define Graph Convolution Layer in Relay # Define Graph Convolution Layer in Relay
# ---------------------------- # ---------------------------------------
# To run GCN on TVM, we first need to implement Graph Convolution Layer. # To run GCN on TVM, we first need to implement Graph Convolution Layer.
#
# You may refer to https://github.com/dmlc/dgl/blob/master/python/dgl/nn/mxnet/conv.py for a GraphConv Layer implemented in DGL with MXNet Backend # You may refer to https://github.com/dmlc/dgl/blob/master/python/dgl/nn/mxnet/conv.py for a GraphConv Layer implemented in DGL with MXNet Backend
# #
# The layer is defined with below operations, note that we apply two transposes to keep adjacency matrix on right hand side of sparse_dense operator, # The layer is defined with below operations, note that we apply two transposes to keep adjacency matrix on right hand side of sparse_dense operator,
...@@ -251,7 +247,7 @@ def GraphConv(layer_name, ...@@ -251,7 +247,7 @@ def GraphConv(layer_name,
###################################################################### ######################################################################
# Prepare the parameters needed in the GraphConv layers # Prepare the parameters needed in the GraphConv layers
# ------------------ # -----------------------------------------------------
# #
import numpy as np import numpy as np
import networkx as nx import networkx as nx
...@@ -282,7 +278,7 @@ assert params['infeats'].shape[0] == params['indptr'].shape[0] - 1 ...@@ -282,7 +278,7 @@ assert params['infeats'].shape[0] == params['indptr'].shape[0] - 1
###################################################################### ######################################################################
# Put layers together # Put layers together
# ------------------ # -------------------
# Define input features, norms, adjacency matrix in Relay # Define input features, norms, adjacency matrix in Relay
infeats = relay.var("infeats", shape=data.features.shape) infeats = relay.var("infeats", shape=data.features.shape)
...@@ -321,7 +317,8 @@ func = relay.Function(relay.analysis.free_vars(output), output) ...@@ -321,7 +317,8 @@ func = relay.Function(relay.analysis.free_vars(output), output)
###################################################################### ######################################################################
# Compile and run with TVM # Compile and run with TVM
# ------------------ # ------------------------
#
# Export the weigths from PyTorch model to Python Dict # Export the weigths from PyTorch model to Python Dict
model_params = {} model_params = {}
for param_tensor in torch_model.state_dict(): for param_tensor in torch_model.state_dict():
...@@ -345,7 +342,7 @@ m.set_input(**params) ...@@ -345,7 +342,7 @@ m.set_input(**params)
###################################################################### ######################################################################
# Run the TVM model, test for accuracy and verify with DGL # Run the TVM model, test for accuracy and verify with DGL
# ------------------ # --------------------------------------------------------
m.run() m.run()
logits_tvm = m.get_output(0).asnumpy() logits_tvm = m.get_output(0).asnumpy()
print("Print the first five outputs from TVM execution\n", logits_tvm[:5]) print("Print the first five outputs from TVM execution\n", logits_tvm[:5])
......
...@@ -16,13 +16,12 @@ ...@@ -16,13 +16,12 @@
# under the License. # under the License.
""" """
Compile TFLite Models Compile TFLite Models
=================== =====================
**Author**: `Zhao Wu <https://github.com/FrozenGene>`_ **Author**: `Zhao Wu <https://github.com/FrozenGene>`_
This article is an introductory tutorial to deploy TFLite models with Relay. This article is an introductory tutorial to deploy TFLite models with Relay.
To get started, Flatbuffers and TFLite package needs to be installed as prerequisites. To get started, Flatbuffers and TFLite package needs to be installed as prerequisites.
A quick solution is to install Flatbuffers via pip A quick solution is to install Flatbuffers via pip
.. code-block:: bash .. code-block:: bash
...@@ -68,7 +67,7 @@ Below you can find an example on how to compile TFLite model using TVM. ...@@ -68,7 +67,7 @@ Below you can find an example on how to compile TFLite model using TVM.
""" """
###################################################################### ######################################################################
# Utils for downloading and extracting zip files # Utils for downloading and extracting zip files
# --------------------------------------------- # ----------------------------------------------
import os import os
def extract(path): def extract(path):
...@@ -84,28 +83,28 @@ def extract(path): ...@@ -84,28 +83,28 @@ def extract(path):
###################################################################### ######################################################################
# Load pretrained TFLite model # Load pretrained TFLite model
# --------------------------------------------- # ----------------------------
# we load mobilenet V1 TFLite model provided by Google # Load mobilenet V1 TFLite model provided by Google
from tvm.contrib.download import download_testdata from tvm.contrib.download import download_testdata
model_url = "http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224.tgz" model_url = "http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224.tgz"
# we download model tar file and extract, finally get mobilenet_v1_1.0_224.tflite # Download model tar file and extract it to get mobilenet_v1_1.0_224.tflite
model_path = download_testdata(model_url, "mobilenet_v1_1.0_224.tgz", module=['tf', 'official']) model_path = download_testdata(model_url, "mobilenet_v1_1.0_224.tgz", module=['tf', 'official'])
model_dir = os.path.dirname(model_path) model_dir = os.path.dirname(model_path)
extract(model_path) extract(model_path)
# now we have mobilenet_v1_1.0_224.tflite on disk and open it # Now we can open mobilenet_v1_1.0_224.tflite
tflite_model_file = os.path.join(model_dir, "mobilenet_v1_1.0_224.tflite") tflite_model_file = os.path.join(model_dir, "mobilenet_v1_1.0_224.tflite")
tflite_model_buf = open(tflite_model_file, "rb").read() tflite_model_buf = open(tflite_model_file, "rb").read()
# get TFLite model from buffer # Get TFLite model from buffer
import tflite.Model import tflite.Model
tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0) tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0)
###################################################################### ######################################################################
# Load a test image # Load a test image
# --------------------------------------------- # -----------------
# A single cat dominates the examples! # A single cat dominates the examples!
from PIL import Image from PIL import Image
from matplotlib import pyplot as plt from matplotlib import pyplot as plt
...@@ -118,10 +117,10 @@ plt.imshow(resized_image) ...@@ -118,10 +117,10 @@ plt.imshow(resized_image)
plt.show() plt.show()
image_data = np.asarray(resized_image).astype("float32") image_data = np.asarray(resized_image).astype("float32")
# after expand_dims, we have format NHWC # Add a dimension to the image so that we have NHWC format layout
image_data = np.expand_dims(image_data, axis=0) image_data = np.expand_dims(image_data, axis=0)
# preprocess image as described here: # Preprocess image as described here:
# https://github.com/tensorflow/models/blob/edb6ed22a801665946c63d650ab9a0b23d98e1b1/research/slim/preprocessing/inception_preprocessing.py#L243 # https://github.com/tensorflow/models/blob/edb6ed22a801665946c63d650ab9a0b23d98e1b1/research/slim/preprocessing/inception_preprocessing.py#L243
image_data[:, :, :, 0] = 2.0 / 255.0 * image_data[:, :, :, 0] - 1 image_data[:, :, :, 0] = 2.0 / 255.0 * image_data[:, :, :, 0] - 1
image_data[:, :, :, 1] = 2.0 / 255.0 * image_data[:, :, :, 1] - 1 image_data[:, :, :, 1] = 2.0 / 255.0 * image_data[:, :, :, 1] - 1
...@@ -130,50 +129,50 @@ print('input', image_data.shape) ...@@ -130,50 +129,50 @@ print('input', image_data.shape)
###################################################################### ######################################################################
# Compile the model with relay # Compile the model with relay
# --------------------------------------------- # ----------------------------
# TFLite input tensor name, shape and type # TFLite input tensor name, shape and type
input_tensor = "input" input_tensor = "input"
input_shape = (1, 224, 224, 3) input_shape = (1, 224, 224, 3)
input_dtype = "float32" input_dtype = "float32"
# parse TFLite model and convert into Relay computation graph # Parse TFLite model and convert it to a Relay module
from tvm import relay from tvm import relay
mod, params = relay.frontend.from_tflite(tflite_model, mod, params = relay.frontend.from_tflite(tflite_model,
shape_dict={input_tensor: input_shape}, shape_dict={input_tensor: input_shape},
dtype_dict={input_tensor: input_dtype}) dtype_dict={input_tensor: input_dtype})
# target x86 CPU # Build the module against to x86 CPU
target = "llvm" target = "llvm"
with relay.build_config(opt_level=3): with relay.build_config(opt_level=3):
graph, lib, params = relay.build(mod, target, params=params) graph, lib, params = relay.build(mod, target, params=params)
###################################################################### ######################################################################
# Execute on TVM # Execute on TVM
# --------------------------------------------- # --------------
import tvm import tvm
from tvm.contrib import graph_runtime as runtime from tvm.contrib import graph_runtime as runtime
# create a runtime executor module # Create a runtime executor module
module = runtime.create(graph, lib, tvm.cpu()) module = runtime.create(graph, lib, tvm.cpu())
# feed input data # Feed input data
module.set_input(input_tensor, tvm.nd.array(image_data)) module.set_input(input_tensor, tvm.nd.array(image_data))
# feed related params # Feed related params
module.set_input(**params) module.set_input(**params)
# run # Run
module.run() module.run()
# get output # Get output
tvm_output = module.get_output(0).asnumpy() tvm_output = module.get_output(0).asnumpy()
###################################################################### ######################################################################
# Display results # Display results
# --------------------------------------------- # ---------------
# load label file # Load label file
label_file_url = ''.join(['https://raw.githubusercontent.com/', label_file_url = ''.join(['https://raw.githubusercontent.com/',
'tensorflow/tensorflow/master/tensorflow/lite/java/demo/', 'tensorflow/tensorflow/master/tensorflow/lite/java/demo/',
'app/src/main/assets/', 'app/src/main/assets/',
...@@ -181,15 +180,15 @@ label_file_url = ''.join(['https://raw.githubusercontent.com/', ...@@ -181,15 +180,15 @@ label_file_url = ''.join(['https://raw.githubusercontent.com/',
label_file = "labels_mobilenet_quant_v1_224.txt" label_file = "labels_mobilenet_quant_v1_224.txt"
label_path = download_testdata(label_file_url, label_file, module='data') label_path = download_testdata(label_file_url, label_file, module='data')
# list of 1001 classes # List of 1001 classes
with open(label_path) as f: with open(label_path) as f:
labels = f.readlines() labels = f.readlines()
# convert result to 1D data # Convert result to 1D data
predictions = np.squeeze(tvm_output) predictions = np.squeeze(tvm_output)
# get top 1 prediction # Get top 1 prediction
prediction = np.argmax(predictions) prediction = np.argmax(predictions)
# convert id to class name and show the result # Convert id to class name and show the result
print("The image prediction result is: id " + str(prediction) + " name: " + labels[prediction]) print("The image prediction result is: id " + str(prediction) + " name: " + labels[prediction])
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
# under the License. # under the License.
""" """
Using External Libraries in Relay Using External Libraries in Relay
================================ =================================
**Author**: `Masahiro Masuda <https://github.com/masahi>`_, `Truman Tian <https://github.com/SiNZeRo>`_ **Author**: `Masahiro Masuda <https://github.com/masahi>`_, `Truman Tian <https://github.com/SiNZeRo>`_
This is a short tutorial on how to use external libraries such as cuDNN, or cuBLAS with Relay. This is a short tutorial on how to use external libraries such as cuDNN, or cuBLAS with Relay.
......
...@@ -56,7 +56,7 @@ print(tvm.lower(s, [A0, A1, B0, B1], simple_mode=True)) ...@@ -56,7 +56,7 @@ print(tvm.lower(s, [A0, A1, B0, B1], simple_mode=True))
# operators, and the inputs will collaborate together, e.g. :code:`argmax`. # operators, and the inputs will collaborate together, e.g. :code:`argmax`.
# In the reduction procedure, :code:`argmax` need to compare the value of # In the reduction procedure, :code:`argmax` need to compare the value of
# operands, also need to keep the index of operand. It can be expressed # operands, also need to keep the index of operand. It can be expressed
# with :any:`comm_reducer` as below: # with :py:func:`tvm.comm_reducer` as below:
# x and y are the operands of reduction, both of them is a tuple of index # x and y are the operands of reduction, both of them is a tuple of index
# and value. # and value.
......
...@@ -18,7 +18,7 @@ ...@@ -18,7 +18,7 @@
.. _opt-conv-tensorcore: .. _opt-conv-tensorcore:
How to optimize convolution using TensorCores How to optimize convolution using TensorCores
================================== =============================================
**Author**: `Siyuan Feng <https://github.com/Hzfengsy>`_ **Author**: `Siyuan Feng <https://github.com/Hzfengsy>`_
In this tutorial, we will demonstrate how to write a high performance convolution In this tutorial, we will demonstrate how to write a high performance convolution
...@@ -29,7 +29,7 @@ convolution has a large batch. We strongly recommend covering the :ref:`opt-conv ...@@ -29,7 +29,7 @@ convolution has a large batch. We strongly recommend covering the :ref:`opt-conv
################################################################ ################################################################
# TensorCore Introduction # TensorCore Introduction
# ------------------------- # -----------------------
# Each Tensor Core provides a 4x4x4 matrix processing array that operates # Each Tensor Core provides a 4x4x4 matrix processing array that operates
# :code:`D = A * B + C`, where A, B, C and D are 4x4 matrices as Figure shows. # :code:`D = A * B + C`, where A, B, C and D are 4x4 matrices as Figure shows.
# The matrix multiplication inputs A and B are FP16 matrices, while the accumulation # The matrix multiplication inputs A and B are FP16 matrices, while the accumulation
...@@ -45,7 +45,7 @@ convolution has a large batch. We strongly recommend covering the :ref:`opt-conv ...@@ -45,7 +45,7 @@ convolution has a large batch. We strongly recommend covering the :ref:`opt-conv
################################################################ ################################################################
# Preparation and Algorithm # Preparation and Algorithm
# -------------------------- # -------------------------
# We use the fixed size for input tensors with 256 channels and 14 x 14 dimensions. # We use the fixed size for input tensors with 256 channels and 14 x 14 dimensions.
# The batch size is 256. Convolution filters contain 512 filters of size 3 x 3. # The batch size is 256. Convolution filters contain 512 filters of size 3 x 3.
# We use stride size 1 and padding size 1 for the convolution. In the example, we use # We use stride size 1 and padding size 1 for the convolution. In the example, we use
...@@ -126,8 +126,7 @@ s[Apad].compute_inline() ...@@ -126,8 +126,7 @@ s[Apad].compute_inline()
############################################################################### ###############################################################################
# Memory Scope # Memory Scope
# ---------------- # ------------
#
# In traditional GPU schedule, we have global, shared and local memory scope. # In traditional GPU schedule, we have global, shared and local memory scope.
# To support TensorCores, we add another three special memory scope: :code:`wmma.matrix_a`, # To support TensorCores, we add another three special memory scope: :code:`wmma.matrix_a`,
# :code:`wmma.matrix_b` and :code:`wmma.accumulator`. On hardware, all fragments scope # :code:`wmma.matrix_b` and :code:`wmma.accumulator`. On hardware, all fragments scope
...@@ -142,6 +141,7 @@ ConvF = s.cache_write(Conv, 'wmma.accumulator') ...@@ -142,6 +141,7 @@ ConvF = s.cache_write(Conv, 'wmma.accumulator')
############################################################################### ###############################################################################
# Define Tensor Intrinsic # Define Tensor Intrinsic
# -----------------------
# In fact, TensorCore is a special hardware operation. So, we can just use tensorize # In fact, TensorCore is a special hardware operation. So, we can just use tensorize
# to replace a unit of computation with the TensorCore instruction. The first thing is # to replace a unit of computation with the TensorCore instruction. The first thing is
# that we need to define tensor intrinsic. # that we need to define tensor intrinsic.
...@@ -246,7 +246,6 @@ def intrin_wmma_store_matrix(): ...@@ -246,7 +246,6 @@ def intrin_wmma_store_matrix():
# easiest way to solve this. Then We can bind threadIdx.x to any loops except those contain # easiest way to solve this. Then We can bind threadIdx.x to any loops except those contain
# TensorCore intrinsics directly or indirectly. Also note that it is not the unique solution. # TensorCore intrinsics directly or indirectly. Also note that it is not the unique solution.
# The only thing we should do is to make sure all threads in a warp can call TensorCore at the same time. # The only thing we should do is to make sure all threads in a warp can call TensorCore at the same time.
#
# Define tiling sizes # Define tiling sizes
block_row_warps = 4 block_row_warps = 4
...@@ -312,10 +311,9 @@ print(tvm.lower(s, [A, W, Conv], simple_mode=True)) ...@@ -312,10 +311,9 @@ print(tvm.lower(s, [A, W, Conv], simple_mode=True))
############################################################################### ###############################################################################
# Lowering Computation to Intrinsics # Lowering Computation to Intrinsics
# -------------------------- # ----------------------------------
# The last phase is to lower the computation loops down to TensorCore hardware intrinsics # The last phase is to lower the computation loops down to TensorCore hardware intrinsics
# by mapping the 2D convolution to tensor intrinsics # by mapping the 2D convolution to tensor intrinsics
#
s[AF].tensorize(AF.op.axis[-2], intrin_wmma_load_matrix('wmma.matrix_a')) s[AF].tensorize(AF.op.axis[-2], intrin_wmma_load_matrix('wmma.matrix_a'))
s[WF].tensorize(WF.op.axis[-2], intrin_wmma_load_matrix('wmma.matrix_b')) s[WF].tensorize(WF.op.axis[-2], intrin_wmma_load_matrix('wmma.matrix_b'))
...@@ -344,5 +342,6 @@ if nvcc.have_tensorcore(ctx.compute_version): ...@@ -344,5 +342,6 @@ if nvcc.have_tensorcore(ctx.compute_version):
############################################################################### ###############################################################################
# Summary # Summary
# -------
# This tutorial demonstrates how TVM scheduling primitives can be used to # This tutorial demonstrates how TVM scheduling primitives can be used to
# call TensorCores on specific GPUs. # call TensorCores on specific GPUs.
...@@ -232,7 +232,6 @@ print(tvm.lower(s, [A, B, C], simple_mode=True)) ...@@ -232,7 +232,6 @@ print(tvm.lower(s, [A, B, C], simple_mode=True))
# #
# .. image:: https://github.com/dmlc/web-data/raw/master/tvm/tutorial/array-packing.png # .. image:: https://github.com/dmlc/web-data/raw/master/tvm/tutorial/array-packing.png
# :align: center # :align: center
# :scale: 100%
# #
......
...@@ -18,7 +18,7 @@ ...@@ -18,7 +18,7 @@
.. _opt-matmul-auto-tensorcore: .. _opt-matmul-auto-tensorcore:
How to optimize matmul with Auto TensorCore CodeGen How to optimize matmul with Auto TensorCore CodeGen
================================== ===================================================
**Author**: `Minmin Sun <https://github.com/minminsun>`_, \ **Author**: `Minmin Sun <https://github.com/minminsun>`_, \
`Lanbo Li <https://github.com/Orion34C>`_, \ `Lanbo Li <https://github.com/Orion34C>`_, \
`Chenfan Jia <https://github.com/jcf94>`_, \ `Chenfan Jia <https://github.com/jcf94>`_, \
...@@ -31,12 +31,11 @@ with most transformations done in ir passes. ...@@ -31,12 +31,11 @@ with most transformations done in ir passes.
Users can also write schedule with tensorization to generate TensorCore code. Users can also write schedule with tensorization to generate TensorCore code.
Both solutions use the same tensorcore intrinsics. Both solutions use the same tensorcore intrinsics.
Please refer to :ref:`opt-conv-tensorcore` tutorial for more details. Please refer to :ref:`opt-conv-tensorcore` tutorial for more details.
""" """
################################################################ ################################################################
# Preparation and Algorithm # Preparation and Algorithm
# -------------------------- # -------------------------
# 2 kinds of input data types are supported: float16 and int8. # 2 kinds of input data types are supported: float16 and int8.
# For float16, the accumulator is float32. # For float16, the accumulator is float32.
# For int8, the accumulator is int32. # For int8, the accumulator is int32.
...@@ -215,7 +214,7 @@ def test_gemm(N, L, M, dtype, layout): ...@@ -215,7 +214,7 @@ def test_gemm(N, L, M, dtype, layout):
############################################################################### ###############################################################################
# AutoTune and Test # AutoTune and Test
# -------------------- # -----------------
# Finally we use a tuner to tune the schedule, generate code with best config # Finally we use a tuner to tune the schedule, generate code with best config
# and run the kernel to compare with numpy to check whether the results are correct. # and run the kernel to compare with numpy to check whether the results are correct.
...@@ -460,6 +459,6 @@ def tune_and_evaluate(M, N, L, dtype, layout): ...@@ -460,6 +459,6 @@ def tune_and_evaluate(M, N, L, dtype, layout):
############################################################################### ###############################################################################
# Summary # Summary
# -------------------------- # -------
# This tutorial demonstrates how to use the AutoTensorCoreCodeGen of TVM # This tutorial demonstrates how to use the AutoTensorCoreCodeGen of TVM
# to generate tensorcore kernels. # to generate tensorcore kernels.
...@@ -18,7 +18,7 @@ ...@@ -18,7 +18,7 @@
.. _tutorial-relay-quick-start: .. _tutorial-relay-quick-start:
Quick Start Tutorial for Compiling Deep Learning Models Quick Start Tutorial for Compiling Deep Learning Models
====================================================== =======================================================
**Author**: `Yao Wang <https://github.com/kevinthesun>`_, `Truman Tian <https://github.com/SiNZeRo>`_ **Author**: `Yao Wang <https://github.com/kevinthesun>`_, `Truman Tian <https://github.com/SiNZeRo>`_
This example shows how to build a neural network with Relay python frontend and This example shows how to build a neural network with Relay python frontend and
...@@ -33,7 +33,6 @@ Notice that you need to build TVM with cuda and llvm enabled. ...@@ -33,7 +33,6 @@ Notice that you need to build TVM with cuda and llvm enabled.
# #
# .. image:: https://github.com/dmlc/web-data/raw/master/tvm/tutorial/tvm_support_list.png # .. image:: https://github.com/dmlc/web-data/raw/master/tvm/tutorial/tvm_support_list.png
# :align: center # :align: center
# :scale: 100%
# #
# In this tutorial, we'll choose cuda and llvm as target backends. # In this tutorial, we'll choose cuda and llvm as target backends.
# To begin with, let's import Relay and TVM. # To begin with, let's import Relay and TVM.
...@@ -47,7 +46,7 @@ from tvm.contrib import graph_runtime ...@@ -47,7 +46,7 @@ from tvm.contrib import graph_runtime
###################################################################### ######################################################################
# Define Neural Network in Relay # Define Neural Network in Relay
# ----------------------------- # ------------------------------
# First, let's define a neural network with relay python frontend. # First, let's define a neural network with relay python frontend.
# For simplicity, we'll use pre-defined resnet-18 network in Relay. # For simplicity, we'll use pre-defined resnet-18 network in Relay.
# Parameters are initialized with Xavier initializer. # Parameters are initialized with Xavier initializer.
......
.. _tutorial-frontend: .. _vta-tutorial-frontend:
Compile Deep Learning Models Compile Deep Learning Models
---------------------------- ----------------------------
...@@ -94,7 +94,7 @@ assert model in pack_dict ...@@ -94,7 +94,7 @@ assert model in pack_dict
###################################################################### ######################################################################
# Obtain an execution remote # Obtain an execution remote
# --------------------------------- # --------------------------
# When target is 'pynq', reconfigure FPGA and runtime. # When target is 'pynq', reconfigure FPGA and runtime.
# Otherwise, if target is 'sim', execute locally. # Otherwise, if target is 'sim', execute locally.
...@@ -136,15 +136,16 @@ ctx = remote.ext_dev(0) if device == "vta" else remote.cpu(0) ...@@ -136,15 +136,16 @@ ctx = remote.ext_dev(0) if device == "vta" else remote.cpu(0)
# --------------------------------- # ---------------------------------
# Grab vision model from Gluon model zoo and compile with Relay. # Grab vision model from Gluon model zoo and compile with Relay.
# The compilation steps are: # The compilation steps are:
# 1) Front end translation from MxNet into Relay module. #
# 2) Apply 8-bit quantization: here we skip the first conv layer, # 1. Front end translation from MxNet into Relay module.
# 2. Apply 8-bit quantization: here we skip the first conv layer,
# and dense layer which will both be executed in fp32 on the CPU. # and dense layer which will both be executed in fp32 on the CPU.
# 3) Perform graph packing to alter the data layout for tensorization. # 3. Perform graph packing to alter the data layout for tensorization.
# 4) Perform constant folding to reduce number of operators (e.g. eliminate # 4. Perform constant folding to reduce number of operators (e.g. eliminate batch norm multiply).
# batch norm multiply). # 5. Perform relay build to object file.
# 5) Perform relay build to object file. # 6. Load the object file onto remote (FPGA device).
# 6) Load the object file onto remote (FPGA device). # 7. Generate graph runtime, `m`.
# 7) Generate graph runtime, `m`. #
# Load pre-configured AutoTVM schedules # Load pre-configured AutoTVM schedules
with autotvm.tophub.context(target): with autotvm.tophub.context(target):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment