Commit 741fdb9e by Marcus Shawcroft Committed by Tianqi Chen

[DOC] Various documentation improvements (#3133)

parent 83cb872e
...@@ -20,7 +20,7 @@ ...@@ -20,7 +20,7 @@
Code Guide and Tips Code Guide and Tips
=================== ===================
This is a document used to record tips in tvm codebase for reviewers and contributors. This is a document used to record tips in TVM codebase for reviewers and contributors.
Most of them are summarized through lessons during the contributing and process. Most of them are summarized through lessons during the contributing and process.
...@@ -42,7 +42,7 @@ Python Code Styles ...@@ -42,7 +42,7 @@ Python Code Styles
Handle Integer Constant Expression Handle Integer Constant Expression
---------------------------------- ----------------------------------
We often need to handle constant integer expressions in tvm. Before we do so, the first question we want to ask is that is it really necessary to get a constant integer. If symbolic expression also works and let the logic flow, we should use symbolic expression as much as possible. So the generated code works for shapes that are not known ahead of time. We often need to handle constant integer expressions in TVM. Before we do so, the first question we want to ask is that is it really necessary to get a constant integer. If symbolic expression also works and let the logic flow, we should use symbolic expression as much as possible. So the generated code works for shapes that are not known ahead of time.
Note that in some cases we cannot know certain information, e.g. sign of symbolic variable, it is ok to make assumptions in certain cases. While adding precise support if the variable is constant. Note that in some cases we cannot know certain information, e.g. sign of symbolic variable, it is ok to make assumptions in certain cases. While adding precise support if the variable is constant.
......
...@@ -19,13 +19,13 @@ ...@@ -19,13 +19,13 @@
Docker Images Docker Images
============= =============
We provide several prebuilt docker images to quickly try out tvm. We provide several prebuilt docker images to quickly try out TVM.
These images are also helpful run through TVM demo and tutorials. These images are also helpful run through TVM demo and tutorials.
You can get the docker images via the following steps. You can get the docker images via the following steps.
We need `docker <https://docs.docker.com/engine/installation/>`_ and We need `docker <https://docs.docker.com/engine/installation/>`_ and
`nvidia-docker <https://github.com/NVIDIA/nvidia-docker/>`_ if we want to use cuda. `nvidia-docker <https://github.com/NVIDIA/nvidia-docker/>`_ if we want to use cuda.
First, clone tvm repo to get the auxiliary scripts First, clone TVM repo to get the auxiliary scripts
.. code:: bash .. code:: bash
......
...@@ -19,13 +19,13 @@ ...@@ -19,13 +19,13 @@
Install from Source Install from Source
=================== ===================
This page gives instructions on how to build and install the tvm package from This page gives instructions on how to build and install the TVM package from
scratch on various systems. It consists of two steps: scratch on various systems. It consists of two steps:
1. First build the shared library from the C++ codes (`libtvm.so` for linux, `libtvm.dylib` for macOS and `libtvm.dll` for windows). 1. First build the shared library from the C++ codes (`libtvm.so` for linux, `libtvm.dylib` for macOS and `libtvm.dll` for windows).
2. Setup for the language packages (e.g. Python Package). 2. Setup for the language packages (e.g. Python Package).
To get started, clone tvm repo from github. It is important to clone the submodules along, with ``--recursive`` option. To get started, clone TVM repo from github. It is important to clone the submodules along, with ``--recursive`` option.
.. code:: bash .. code:: bash
...@@ -63,7 +63,7 @@ The minimal building requirements are ...@@ -63,7 +63,7 @@ The minimal building requirements are
- If you want to use the NNVM compiler, then LLVM is required - If you want to use the NNVM compiler, then LLVM is required
We use cmake to build the library. We use cmake to build the library.
The configuration of tvm can be modified by `config.cmake`. The configuration of TVM can be modified by `config.cmake`.
- First, check the cmake in your system. If you do not have cmake, - First, check the cmake in your system. If you do not have cmake,
...@@ -111,7 +111,7 @@ Building on Windows ...@@ -111,7 +111,7 @@ Building on Windows
TVM support build via MSVC using cmake. The minimum required VS version is **Visual Studio Community 2015 Update 3**. TVM support build via MSVC using cmake. The minimum required VS version is **Visual Studio Community 2015 Update 3**.
In order to generate the VS solution file using cmake, In order to generate the VS solution file using cmake,
make sure you have a recent version of cmake added to your path and then from the tvm directory: make sure you have a recent version of cmake added to your path and then from the TVM directory:
.. code:: bash .. code:: bash
...@@ -159,7 +159,7 @@ Method 1 ...@@ -159,7 +159,7 @@ Method 1
Method 2 Method 2
Install tvm python bindings by `setup.py`: Install TVM python bindings by `setup.py`:
.. code:: bash .. code:: bash
......
...@@ -19,7 +19,7 @@ Installation ...@@ -19,7 +19,7 @@ Installation
============ ============
To install TVM, please read :ref:`install-from-source`. To install TVM, please read :ref:`install-from-source`.
If you are interested in deploying to mobile/embedded devices, If you are interested in deploying to mobile/embedded devices,
you do not need to install the entire tvm stack on your device, you do not need to install the entire TVM stack on your device,
instead, you only need the runtime, please read :ref:`deploy-and-integration`. instead, you only need the runtime, please read :ref:`deploy-and-integration`.
If you would like to quickly try out TVM or do demo/tutorials, checkout :ref:`docker-images` If you would like to quickly try out TVM or do demo/tutorials, checkout :ref:`docker-images`
......
...@@ -94,10 +94,10 @@ class Stage : public NodeRef { ...@@ -94,10 +94,10 @@ class Stage : public NodeRef {
*/ */
EXPORT Stage& compute_root(); // NOLINT(*) EXPORT Stage& compute_root(); // NOLINT(*)
/*! /*!
* \brief Bind the ivar to thread index. * \brief Bind the IterVar to thread index.
* *
* \param ivar The IterVar to be binded. * \param ivar The IterVar to be bound.
* \param thread_ivar The thread axis to be binded. * \param thread_ivar The thread axis to be bound.
* \return reference to self. * \return reference to self.
*/ */
EXPORT Stage& bind(IterVar ivar, IterVar thread_ivar); EXPORT Stage& bind(IterVar ivar, IterVar thread_ivar);
...@@ -107,7 +107,7 @@ class Stage : public NodeRef { ...@@ -107,7 +107,7 @@ class Stage : public NodeRef {
* need one of them to do the store. * need one of them to do the store.
* *
* \note This is a dangerous scheduling primitive that can change behavior of program. * \note This is a dangerous scheduling primitive that can change behavior of program.
* Only do when we are certain that thare are duplicated store. * Only do when we are certain that thare are duplicated stores.
* \param predicate The condition to be checked. * \param predicate The condition to be checked.
* \return reference to self. * \return reference to self.
*/ */
...@@ -155,7 +155,7 @@ class Stage : public NodeRef { ...@@ -155,7 +155,7 @@ class Stage : public NodeRef {
* \param p_target The result target domain. * \param p_target The result target domain.
* *
* \note axes can be an empty array, * \note axes can be an empty array,
* in that case, a singleton itervar is created and * in that case, a singleton IterVar is created and
* inserted to the outermost loop. * inserted to the outermost loop.
* The fuse of empty array is used to support zero-dimension tensors. * The fuse of empty array is used to support zero-dimension tensors.
* *
......
...@@ -110,7 +110,7 @@ class Tensor(NodeBase, _expr.ExprOp): ...@@ -110,7 +110,7 @@ class Tensor(NodeBase, _expr.ExprOp):
@property @property
def value_index(self): def value_index(self):
"""The output value index the tensor corressponds to.""" """The output value index the tensor corresponds to."""
return self.__getattr__("value_index") return self.__getattr__("value_index")
@property @property
...@@ -128,7 +128,7 @@ class Tensor(NodeBase, _expr.ExprOp): ...@@ -128,7 +128,7 @@ class Tensor(NodeBase, _expr.ExprOp):
class Operation(NodeBase): class Operation(NodeBase):
"""Represent an operation that generate a tensor""" """Represent an operation that generates a tensor"""
def output(self, index): def output(self, index):
"""Get the index-th output of the operation """Get the index-th output of the operation
...@@ -197,7 +197,7 @@ class ScanOp(Operation): ...@@ -197,7 +197,7 @@ class ScanOp(Operation):
@register_node @register_node
class ExternOp(Operation): class ExternOp(Operation):
"""Extern operation.""" """External operation."""
@register_node @register_node
......
...@@ -34,7 +34,7 @@ vendor provided library CuDNN in many cases. ...@@ -34,7 +34,7 @@ vendor provided library CuDNN in many cases.
# #
# pip3 install --user psutil xgboost tornado # pip3 install --user psutil xgboost tornado
# #
# To make tvm run faster in tuning, it is recommended to use cython # To make TVM run faster in tuning, it is recommended to use cython
# as FFI of tvm. In the root directory of tvm, execute # as FFI of tvm. In the root directory of tvm, execute
# #
# .. code-block:: bash # .. code-block:: bash
......
...@@ -27,7 +27,7 @@ The operator implementation for ARM CPU in TVM is written in template form. ...@@ -27,7 +27,7 @@ The operator implementation for ARM CPU in TVM is written in template form.
The template has many tunable knobs (tile factor, vectorization, unrolling, etc). The template has many tunable knobs (tile factor, vectorization, unrolling, etc).
We will tune all convolution and depthwise convolution operators We will tune all convolution and depthwise convolution operators
in the neural network. After tuning, we produce a log file which stores in the neural network. After tuning, we produce a log file which stores
the best knob values for all required operators. When the tvm compiler compiles the best knob values for all required operators. When the TVM compiler compiles
these operators, it will query this log file to get the best knob values. these operators, it will query this log file to get the best knob values.
We also released pre-tuned parameters for some arm devices. You can go to We also released pre-tuned parameters for some arm devices. You can go to
...@@ -45,8 +45,8 @@ to see the results. ...@@ -45,8 +45,8 @@ to see the results.
# #
# pip3 install --user psutil xgboost tornado # pip3 install --user psutil xgboost tornado
# #
# To make tvm run faster during tuning, it is recommended to use cython # To make TVM run faster during tuning, it is recommended to use cython
# as FFI of tvm. In the root directory of tvm, execute # as FFI of TVM. In the root directory of TVM, execute
# (change "3" to "2" if you use python2): # (change "3" to "2" if you use python2):
# #
# .. code-block:: bash # .. code-block:: bash
...@@ -134,11 +134,11 @@ def get_network(name, batch_size): ...@@ -134,11 +134,11 @@ def get_network(name, batch_size):
# Register devices to RPC Tracker # Register devices to RPC Tracker
# ----------------------------------- # -----------------------------------
# Now we can register our devices to the tracker. The first step is to # Now we can register our devices to the tracker. The first step is to
# build tvm runtime for the ARM devices. # build the TVM runtime for the ARM devices.
# #
# * For Linux: # * For Linux:
# Follow this section :ref:`build-tvm-runtime-on-device` to build # Follow this section :ref:`build-tvm-runtime-on-device` to build
# tvm runtime on the device. Then register the device to tracker by # the TVM runtime on the device. Then register the device to tracker by
# #
# .. code-block:: bash # .. code-block:: bash
# #
...@@ -148,7 +148,7 @@ def get_network(name, batch_size): ...@@ -148,7 +148,7 @@ def get_network(name, batch_size):
# #
# * For Android: # * For Android:
# Follow this `readme page <https://github.com/dmlc/tvm/tree/master/apps/android_rpc>`_ to # Follow this `readme page <https://github.com/dmlc/tvm/tree/master/apps/android_rpc>`_ to
# install tvm rpc apk on the android device. Make sure you can pass the android rpc test. # install the TVM RPC APK on the android device. Make sure you can pass the android rpc test.
# Then you have already registred your device. During tuning, you have to go to developer option # Then you have already registred your device. During tuning, you have to go to developer option
# and enable "Keep screen awake during changing" and charge your phone to make it stable. # and enable "Keep screen awake during changing" and charge your phone to make it stable.
# #
......
...@@ -27,7 +27,7 @@ The operator implementation for NVIDIA GPU in TVM is written in template form. ...@@ -27,7 +27,7 @@ The operator implementation for NVIDIA GPU in TVM is written in template form.
The template has many tunable knobs (tile factor, unrolling, etc). The template has many tunable knobs (tile factor, unrolling, etc).
We will tune all convolution and depthwise convolution operators We will tune all convolution and depthwise convolution operators
in the neural network. After tuning, we produce a log file which stores in the neural network. After tuning, we produce a log file which stores
the best knob values for all required operators. When the tvm compiler compiles the best knob values for all required operators. When the TVM compiler compiles
these operators, it will query this log file to get the best knob values. these operators, it will query this log file to get the best knob values.
We also released pre-tuned parameters for some NVIDIA GPUs. You can go to We also released pre-tuned parameters for some NVIDIA GPUs. You can go to
...@@ -45,7 +45,7 @@ to see the results. ...@@ -45,7 +45,7 @@ to see the results.
# #
# pip3 install --user psutil xgboost tornado # pip3 install --user psutil xgboost tornado
# #
# To make tvm run faster during tuning, it is recommended to use cython # To make TVM run faster during tuning, it is recommended to use cython
# as FFI of tvm. In the root directory of tvm, execute: # as FFI of tvm. In the root directory of tvm, execute:
# #
# .. code-block:: bash # .. code-block:: bash
......
...@@ -27,7 +27,7 @@ The operator implementation for Mobile GPU in TVM is written in template form. ...@@ -27,7 +27,7 @@ The operator implementation for Mobile GPU in TVM is written in template form.
The template has many tunable knobs (tile factor, vectorization, unrolling, etc). The template has many tunable knobs (tile factor, vectorization, unrolling, etc).
We will tune all convolution, depthwise convolution and dense operators We will tune all convolution, depthwise convolution and dense operators
in the neural network. After tuning, we produce a log file which stores in the neural network. After tuning, we produce a log file which stores
the best knob values for all required operators. When the tvm compiler compiles the best knob values for all required operators. When the TVM compiler compiles
these operators, it will query this log file to get the best knob values. these operators, it will query this log file to get the best knob values.
We also released pre-tuned parameters for some arm devices. You can go to We also released pre-tuned parameters for some arm devices. You can go to
...@@ -45,7 +45,7 @@ to see the results. ...@@ -45,7 +45,7 @@ to see the results.
# #
# pip3 install --user psutil xgboost tornado # pip3 install --user psutil xgboost tornado
# #
# To make tvm run faster during tuning, it is recommended to use cython # To make TVM run faster during tuning, it is recommended to use cython
# as FFI of tvm. In the root directory of tvm, execute # as FFI of tvm. In the root directory of tvm, execute
# (change "3" to "2" if you use python2): # (change "3" to "2" if you use python2):
# #
...@@ -135,11 +135,11 @@ def get_network(name, batch_size): ...@@ -135,11 +135,11 @@ def get_network(name, batch_size):
# Register devices to RPC Tracker # Register devices to RPC Tracker
# ----------------------------------- # -----------------------------------
# Now we can register our devices to the tracker. The first step is to # Now we can register our devices to the tracker. The first step is to
# build tvm runtime for the ARM devices. # build the TVM runtime for the ARM devices.
# #
# * For Linux: # * For Linux:
# Follow this section :ref:`build-tvm-runtime-on-device` to build # Follow this section :ref:`build-tvm-runtime-on-device` to build
# tvm runtime on the device. Then register the device to tracker by # the TVM runtime on the device. Then register the device to tracker by
# #
# .. code-block:: bash # .. code-block:: bash
# #
...@@ -149,7 +149,7 @@ def get_network(name, batch_size): ...@@ -149,7 +149,7 @@ def get_network(name, batch_size):
# #
# * For Android: # * For Android:
# Follow this `readme page <https://github.com/dmlc/tvm/tree/master/apps/android_rpc>`_ to # Follow this `readme page <https://github.com/dmlc/tvm/tree/master/apps/android_rpc>`_ to
# install tvm rpc apk on the android device. Make sure you can pass the android rpc test. # install TVM RPC APK on the android device. Make sure you can pass the android RPC test.
# Then you have already registred your device. During tuning, you have to go to developer option # Then you have already registred your device. During tuning, you have to go to developer option
# and enable "Keep screen awake during changing" and charge your phone to make it stable. # and enable "Keep screen awake during changing" and charge your phone to make it stable.
# #
......
...@@ -20,7 +20,7 @@ Auto-tuning a convolutional network for x86 CPU ...@@ -20,7 +20,7 @@ Auto-tuning a convolutional network for x86 CPU
**Author**: `Yao Wang <https://github.com/kevinthesun>`_, `Eddie Yan <https://github.com/eqy>`_ **Author**: `Yao Wang <https://github.com/kevinthesun>`_, `Eddie Yan <https://github.com/eqy>`_
This is a tutorial about how to tune convolution neural network This is a tutorial about how to tune convolution neural network
for x86 cpu. for x86 CPU.
""" """
import os import os
import numpy as np import numpy as np
...@@ -70,7 +70,7 @@ def get_network(name, batch_size): ...@@ -70,7 +70,7 @@ def get_network(name, batch_size):
return net, params, input_shape, output_shape return net, params, input_shape, output_shape
# Replace "llvm" with the correct target of your cpu. # Replace "llvm" with the correct target of your CPU.
# For example, for AWS EC2 c5 instance with Intel Xeon # For example, for AWS EC2 c5 instance with Intel Xeon
# Platinum 8000 series, the target should be "llvm -mcpu=skylake-avx512". # Platinum 8000 series, the target should be "llvm -mcpu=skylake-avx512".
# For AWS EC2 c4 instance with Intel Xeon E5-2666 v3, it should be # For AWS EC2 c4 instance with Intel Xeon E5-2666 v3, it should be
...@@ -83,7 +83,7 @@ model_name = "resnet-18" ...@@ -83,7 +83,7 @@ model_name = "resnet-18"
log_file = "%s.log" % model_name log_file = "%s.log" % model_name
# Set number of threads used for tuning based on the number of # Set number of threads used for tuning based on the number of
# physical cpu cores on your machine. # physical CPU cores on your machine.
num_threads = 1 num_threads = 1
os.environ["TVM_NUM_THREADS"] = str(num_threads) os.environ["TVM_NUM_THREADS"] = str(num_threads)
...@@ -91,7 +91,7 @@ os.environ["TVM_NUM_THREADS"] = str(num_threads) ...@@ -91,7 +91,7 @@ os.environ["TVM_NUM_THREADS"] = str(num_threads)
################################################################# #################################################################
# Configure tensor tuning settings and create tasks # Configure tensor tuning settings and create tasks
# ------------------------------------------------- # -------------------------------------------------
# To get better kernel execution performance on x86 cpu, # To get better kernel execution performance on x86 CPU,
# we need to change data layout of convolution kernel from # we need to change data layout of convolution kernel from
# "NCHW" to "NCHWc". To deal with this situation, we define # "NCHW" to "NCHWc". To deal with this situation, we define
# conv2d_NCHWc operator in topi. We will tune this operator # conv2d_NCHWc operator in topi. We will tune this operator
......
...@@ -38,8 +38,8 @@ The whole workflow is illustrated by a matrix multiplication example. ...@@ -38,8 +38,8 @@ The whole workflow is illustrated by a matrix multiplication example.
# #
# pip3 install --user psutil xgboost # pip3 install --user psutil xgboost
# #
# To make tvm run faster in tuning, it is recommended to use cython # To make TVM run faster in tuning, it is recommended to use cython
# as FFI of tvm. In the root directory of tvm, execute # as FFI of TVM. In the root directory of TVM, execute
# (change "3" to "2" if you use python2): # (change "3" to "2" if you use python2):
# #
# .. code-block:: bash # .. code-block:: bash
...@@ -61,7 +61,7 @@ from tvm import autotvm ...@@ -61,7 +61,7 @@ from tvm import autotvm
###################################################################### ######################################################################
# Step 1: Define the search space # Step 1: Define the search space
# -------------------------------- # --------------------------------
# In this section, we will rewrite a deterministic tvm schedule code to a # In this section, we will rewrite a deterministic TVM schedule code to a
# tunable schedule template. You can regard the process of search space definition # tunable schedule template. You can regard the process of search space definition
# as the parameterization of our existing schedule code. # as the parameterization of our existing schedule code.
# #
...@@ -288,7 +288,7 @@ logging.getLogger('autotvm').setLevel(logging.DEBUG) ...@@ -288,7 +288,7 @@ logging.getLogger('autotvm').setLevel(logging.DEBUG)
logging.getLogger('autotvm').addHandler(logging.StreamHandler(sys.stdout)) logging.getLogger('autotvm').addHandler(logging.StreamHandler(sys.stdout))
# There are two steps for measuring a config: build and run. # There are two steps for measuring a config: build and run.
# By default, we use all cpu cores to compile program. Then measure them sequentially. # By default, we use all CPU cores to compile program. Then measure them sequentially.
# We measure 5 times and take average to reduce variance. # We measure 5 times and take average to reduce variance.
measure_option = autotvm.measure_option( measure_option = autotvm.measure_option(
builder='local', builder='local',
......
...@@ -35,7 +35,7 @@ and Firefly-RK3399 for opencl example. ...@@ -35,7 +35,7 @@ and Firefly-RK3399 for opencl example.
# Build TVM Runtime on Device # Build TVM Runtime on Device
# --------------------------- # ---------------------------
# #
# The first step is to build tvm runtime on the remote device. # The first step is to build the TVM runtime on the remote device.
# #
# .. note:: # .. note::
# #
...@@ -43,8 +43,8 @@ and Firefly-RK3399 for opencl example. ...@@ -43,8 +43,8 @@ and Firefly-RK3399 for opencl example.
# executed on the target device, e.g. Raspberry Pi. And we assume it # executed on the target device, e.g. Raspberry Pi. And we assume it
# has Linux running. # has Linux running.
# #
# Since we do compilation on local machine, the remote device is only used # Since we do compilation on the local machine, the remote device is only used
# for running the generated code. We only need to build tvm runtime on # for running the generated code. We only need to build the TVM runtime on
# the remote device. # the remote device.
# #
# .. code-block:: bash # .. code-block:: bash
......
...@@ -52,7 +52,7 @@ from tvm.contrib.download import download_testdata ...@@ -52,7 +52,7 @@ from tvm.contrib.download import download_testdata
# docker run --pid=host -h tvm -v $PWD:/workspace \ # docker run --pid=host -h tvm -v $PWD:/workspace \
# -w /workspace -p 9190:9190 --name tvm -it tvm.demo_android bash # -w /workspace -p 9190:9190 --name tvm -it tvm.demo_android bash
# #
# You are now inside the container. The cloned tvm directory is mounted on /workspace. # You are now inside the container. The cloned TVM directory is mounted on /workspace.
# At this time, mount the 9190 port used by RPC described later. # At this time, mount the 9190 port used by RPC described later.
# #
# .. note:: # .. note::
...@@ -74,7 +74,7 @@ from tvm.contrib.download import download_testdata ...@@ -74,7 +74,7 @@ from tvm.contrib.download import download_testdata
# .. # ..
# make -j10 # make -j10
# #
# After building tvm successfully, Please set PYTHONPATH. # After building TVM successfully, Please set PYTHONPATH.
# #
# .. code-block:: bash # .. code-block:: bash
# #
...@@ -106,7 +106,7 @@ from tvm.contrib.download import download_testdata ...@@ -106,7 +106,7 @@ from tvm.contrib.download import download_testdata
# Now we can register our Android device to the tracker. # Now we can register our Android device to the tracker.
# #
# Follow this `readme page <https://github.com/dmlc/tvm/tree/master/apps/android_rpc>`_ to # Follow this `readme page <https://github.com/dmlc/tvm/tree/master/apps/android_rpc>`_ to
# install tvm rpc apk on the android device. # install TVM RPC APK on the android device.
# #
# Here is an example of config.mk. I enabled OpenCL and Vulkan. # Here is an example of config.mk. I enabled OpenCL and Vulkan.
# #
......
...@@ -38,7 +38,7 @@ from tvm.contrib.download import download_testdata ...@@ -38,7 +38,7 @@ from tvm.contrib.download import download_testdata
# Build TVM Runtime on Device # Build TVM Runtime on Device
# --------------------------- # ---------------------------
# #
# The first step is to build tvm runtime on the remote device. # The first step is to build the TVM runtime on the remote device.
# #
# .. note:: # .. note::
# #
......
...@@ -43,7 +43,7 @@ from gluoncv import model_zoo, data, utils ...@@ -43,7 +43,7 @@ from gluoncv import model_zoo, data, utils
# To get best inference performance on CPU, change # To get best inference performance on CPU, change
# target argument according to your device and # target argument according to your device and
# follow the :ref:`tune_relay_x86` to tune x86 CPU and # follow the :ref:`tune_relay_x86` to tune x86 CPU and
# :ref:`tune_relay_arm` for arm cpu. # :ref:`tune_relay_arm` for arm CPU.
# #
# To get best performance fo SSD on Intel graphics, # To get best performance fo SSD on Intel graphics,
# change target argument to 'opencl -device=intel_graphics' # change target argument to 'opencl -device=intel_graphics'
......
...@@ -86,7 +86,7 @@ from tvm import relay ...@@ -86,7 +86,7 @@ from tvm import relay
func, params = relay.frontend.from_caffe2(resnet50.init_net, resnet50.predict_net, shape_dict, dtype_dict) func, params = relay.frontend.from_caffe2(resnet50.init_net, resnet50.predict_net, shape_dict, dtype_dict)
# compile the model # compile the model
# target x86 cpu # target x86 CPU
target = 'llvm' target = 'llvm'
with relay.build_config(opt_level=3): with relay.build_config(opt_level=3):
graph, lib, params = relay.build(func, target, params=params) graph, lib, params = relay.build(func, target, params=params)
...@@ -97,7 +97,7 @@ with relay.build_config(opt_level=3): ...@@ -97,7 +97,7 @@ with relay.build_config(opt_level=3):
# The process is no different from other examples. # The process is no different from other examples.
import tvm import tvm
from tvm.contrib import graph_runtime from tvm.contrib import graph_runtime
# context x86 cpu, use tvm.gpu(0) if you run on GPU # context x86 CPU, use tvm.gpu(0) if you run on GPU
ctx = tvm.cpu(0) ctx = tvm.cpu(0)
# create a runtime executor module # create a runtime executor module
m = graph_runtime.create(graph, lib, ctx) m = graph_runtime.create(graph, lib, ctx)
......
...@@ -135,7 +135,7 @@ print("Tensorflow protobuf imported to relay frontend.") ...@@ -135,7 +135,7 @@ print("Tensorflow protobuf imported to relay frontend.")
# Results: # Results:
# graph: Final graph after compilation. # graph: Final graph after compilation.
# params: final params after compilation. # params: final params after compilation.
# lib: target library which can be deployed on target with tvm runtime. # lib: target library which can be deployed on target with TVM runtime.
with relay.build_config(opt_level=3): with relay.build_config(opt_level=3):
graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params) graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params)
......
...@@ -151,7 +151,7 @@ func, params = relay.frontend.from_tflite(tflite_model, ...@@ -151,7 +151,7 @@ func, params = relay.frontend.from_tflite(tflite_model,
shape_dict={input_tensor: input_shape}, shape_dict={input_tensor: input_shape},
dtype_dict={input_tensor: input_dtype}) dtype_dict={input_tensor: input_dtype})
# targt x86 cpu # target x86 CPU
target = "llvm" target = "llvm"
with relay.build_module.build_config(opt_level=3): with relay.build_module.build_config(opt_level=3):
graph, lib, params = relay.build(func, target, params=params) graph, lib, params = relay.build(func, target, params=params)
......
...@@ -25,7 +25,7 @@ the pipeline. For example, we might want to use cuDNN for ...@@ -25,7 +25,7 @@ the pipeline. For example, we might want to use cuDNN for
some of the convolution kernels and define the rest of the stages. some of the convolution kernels and define the rest of the stages.
TVM supports these black box function calls natively. TVM supports these black box function calls natively.
Specfically, tvm support all the tensor functions that are DLPack compatible. Specfically, TVM support all the tensor functions that are DLPack compatible.
Which means we can call any function with POD types(pointer, int, float) Which means we can call any function with POD types(pointer, int, float)
or pointer to DLTensor as argument. or pointer to DLTensor as argument.
""" """
...@@ -46,7 +46,7 @@ from tvm.contrib import cblas ...@@ -46,7 +46,7 @@ from tvm.contrib import cblas
# The compute function takes list of symbolic placeholder for the inputs, # The compute function takes list of symbolic placeholder for the inputs,
# list of symbolic placeholder for the outputs and returns the executing statement. # list of symbolic placeholder for the outputs and returns the executing statement.
# #
# In this case we simply call a registered tvm function, which invokes a CBLAS call. # In this case we simply call a registered TVM function, which invokes a CBLAS call.
# TVM does not control internal of the extern array function and treats it as blackbox. # TVM does not control internal of the extern array function and treats it as blackbox.
# We can further mix schedulable TVM calls that add a bias term to the result. # We can further mix schedulable TVM calls that add a bias term to the result.
# #
...@@ -95,7 +95,7 @@ s = tvm.create_schedule(D.op) ...@@ -95,7 +95,7 @@ s = tvm.create_schedule(D.op)
# Since we can call into any PackedFunc in TVM. We can use the extern # Since we can call into any PackedFunc in TVM. We can use the extern
# function to callback into python. # function to callback into python.
# #
# The following example registers a python function into tvm runtime system # The following example registers a python function into TVM runtime system
# and use it to complete one stage of the computation. # and use it to complete one stage of the computation.
# This makes TVM much more flexible. For example, we can insert front-end # This makes TVM much more flexible. For example, we can insert front-end
# callbacks to inspect the intermediate results or mix customized code # callbacks to inspect the intermediate results or mix customized code
......
...@@ -77,7 +77,7 @@ print(tvm.lower(s, [X, s_scan], simple_mode=True)) ...@@ -77,7 +77,7 @@ print(tvm.lower(s, [X, s_scan], simple_mode=True))
###################################################################### ######################################################################
# Build and Verify # Build and Verify
# ---------------- # ----------------
# We can build the scan kernel like other tvm kernels, here we use # We can build the scan kernel like other TVM kernels, here we use
# numpy to verify the correctness of the result. # numpy to verify the correctness of the result.
# #
fscan = tvm.build(s, [X, s_scan], "cuda", name="myscan") fscan = tvm.build(s, [X, s_scan], "cuda", name="myscan")
......
...@@ -143,10 +143,10 @@ fadd = tvm.build(s, [A, B, C], tgt, target_host=tgt_host, name="myadd") ...@@ -143,10 +143,10 @@ fadd = tvm.build(s, [A, B, C], tgt, target_host=tgt_host, name="myadd")
# We provide an minimum array API in python to aid quick testing and prototyping. # We provide an minimum array API in python to aid quick testing and prototyping.
# The array API is based on `DLPack <https://github.com/dmlc/dlpack>`_ standard. # The array API is based on `DLPack <https://github.com/dmlc/dlpack>`_ standard.
# #
# - We first create a gpu context. # - We first create a GPU context.
# - Then tvm.nd.array copies the data to gpu. # - Then tvm.nd.array copies the data to GPU.
# - fadd runs the actual computation. # - fadd runs the actual computation.
# - asnumpy() copies the gpu array back to cpu and we can use this to verify correctness # - asnumpy() copies the GPU array back to CPU and we can use this to verify correctness
# #
ctx = tvm.context(tgt, 0) ctx = tvm.context(tgt, 0)
...@@ -161,7 +161,7 @@ tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy()) ...@@ -161,7 +161,7 @@ tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy())
# Inspect the Generated Code # Inspect the Generated Code
# -------------------------- # --------------------------
# You can inspect the generated code in TVM. The result of tvm.build # You can inspect the generated code in TVM. The result of tvm.build
# is a tvm Module. fadd is the host module that contains the host wrapper, # is a TVM Module. fadd is the host module that contains the host wrapper,
# it also contains a device module for the CUDA (GPU) function. # it also contains a device module for the CUDA (GPU) function.
# #
# The following code fetches the device module and prints the content code. # The following code fetches the device module and prints the content code.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment