Commit e3d29280 by Marcus Shawcroft Committed by Tianqi Chen

[DOC] minor gramatical improvements to tensor_expr_get_started (#3330)

parent 30f757ed
...@@ -19,7 +19,7 @@ Get Started with Tensor Expression ...@@ -19,7 +19,7 @@ Get Started with Tensor Expression
================================== ==================================
**Author**: `Tianqi Chen <https://tqchen.github.io>`_ **Author**: `Tianqi Chen <https://tqchen.github.io>`_
This is an introduction tutorial to Tensor expression language in TVM. This is an introductory tutorial to the Tensor expression language in TVM.
TVM uses a domain specific tensor expression for efficient kernel construction. TVM uses a domain specific tensor expression for efficient kernel construction.
In this tutorial, we will demonstrate the basic workflow to use In this tutorial, we will demonstrate the basic workflow to use
...@@ -48,15 +48,16 @@ tgt="cuda" ...@@ -48,15 +48,16 @@ tgt="cuda"
# ------------------------ # ------------------------
# As a first step, we need to describe our computation. # As a first step, we need to describe our computation.
# TVM adopts tensor semantics, with each intermediate result # TVM adopts tensor semantics, with each intermediate result
# represented as multi-dimensional array. The user need to describe # represented as a multi-dimensional array. The user needs to describe
# the computation rule that generate the tensors. # the computation rule that generates the tensors.
# #
# We first define a symbolic variable n to represent the shape. # We first define a symbolic variable n to represent the shape.
# We then define two placeholder Tensors, A and B, with given shape (n,) # We then define two placeholder Tensors, A and B, with given shape (n,)
# #
# We then describe the result tensor C, with a compute operation. # We then describe the result tensor C, with a compute operation. The
# The compute function takes the shape of the tensor, as well as a lambda function # compute function takes the shape of the tensor, as well as a lambda
# that describes the computation rule for each position of the tensor. # function that describes the computation rule for each position of
# the tensor.
# #
# No computation happens during this phase, as we are only declaring how # No computation happens during this phase, as we are only declaring how
# the computation should be done. # the computation should be done.
...@@ -70,9 +71,10 @@ print(type(C)) ...@@ -70,9 +71,10 @@ print(type(C))
###################################################################### ######################################################################
# Schedule the Computation # Schedule the Computation
# ------------------------ # ------------------------
# While the above lines describes the computation rule, we can compute # While the above lines describe the computation rule, we can compute
# C in many ways since the axis of C can be computed in data parallel manner. # C in many ways since the axis of C can be computed in a data
# TVM asks user to provide a description of computation called schedule. # parallel manner. TVM asks the user to provide a description of the
# computation called a schedule.
# #
# A schedule is a set of transformation of computation that transforms # A schedule is a set of transformation of computation that transforms
# the loop of computations in the program. # the loop of computations in the program.
...@@ -120,33 +122,33 @@ if tgt == "cuda" or tgt.startswith('opencl'): ...@@ -120,33 +122,33 @@ if tgt == "cuda" or tgt.startswith('opencl'):
# ----------- # -----------
# After we have finished specifying the schedule, we can compile it # After we have finished specifying the schedule, we can compile it
# into a TVM function. By default TVM compiles into a type-erased # into a TVM function. By default TVM compiles into a type-erased
# function that can be directly called from python side. # function that can be directly called from the python side.
# #
# In the following line, we use tvm.build to create a function. # In the following line, we use tvm.build to create a function.
# The build function takes the schedule, the desired signature of the # The build function takes the schedule, the desired signature of the
# function(including the inputs and outputs) as well as target language # function (including the inputs and outputs) as well as target language
# we want to compile to. # we want to compile to.
# #
# The result of compilation fadd is a GPU device function(if GPU is involved) # The result of compilation fadd is a GPU device function (if GPU is
# that can as well as a host wrapper that calls into the GPU function. # involved) as well as a host wrapper that calls into the GPU
# fadd is the generated host wrapper function, it contains reference # function. fadd is the generated host wrapper function, it contains
# to the generated device function internally. # a reference to the generated device function internally.
# #
fadd = tvm.build(s, [A, B, C], tgt, target_host=tgt_host, name="myadd") fadd = tvm.build(s, [A, B, C], tgt, target_host=tgt_host, name="myadd")
###################################################################### ######################################################################
# Run the Function # Run the Function
# ---------------- # ----------------
# The compiled function TVM function is designed to be a concise C API # The compiled TVM function is exposes a concise C API
# that can be invoked from any languages. # that can be invoked from any language.
# #
# We provide an minimum array API in python to aid quick testing and prototyping. # We provide a minimal array API in python to aid quick testing and prototyping.
# The array API is based on `DLPack <https://github.com/dmlc/dlpack>`_ standard. # The array API is based on the `DLPack <https://github.com/dmlc/dlpack>`_ standard.
# #
# - We first create a GPU context. # - We first create a GPU context.
# - Then tvm.nd.array copies the data to GPU. # - Then tvm.nd.array copies the data to the GPU.
# - fadd runs the actual computation. # - fadd runs the actual computation.
# - asnumpy() copies the GPU array back to CPU and we can use this to verify correctness # - asnumpy() copies the GPU array back to the CPU and we can use this to verify correctness
# #
ctx = tvm.context(tgt, 0) ctx = tvm.context(tgt, 0)
...@@ -176,14 +178,14 @@ else: ...@@ -176,14 +178,14 @@ else:
###################################################################### ######################################################################
# .. note:: Code Specialization # .. note:: Code Specialization
# #
# As you may noticed, during the declaration, A, B and C both # As you may have noticed, the declarations of A, B and C all
# takes the same shape argument n. TVM will take advantage of this # take the same shape argument, n. TVM will take advantage of this
# to pass only single shape argument to the kernel, as you will find in # to pass only a single shape argument to the kernel, as you will find in
# the printed device code. This is one form of specialization. # the printed device code. This is one form of specialization.
# #
# On the host side, TVM will automatically generate check code # On the host side, TVM will automatically generate check code
# that checks the constraints in the parameters. So if you pass # that checks the constraints in the parameters. So if you pass
# arrays with different shapes into the fadd, an error will be raised. # arrays with different shapes into fadd, an error will be raised.
# #
# We can do more specializations. For example, we can write # We can do more specializations. For example, we can write
# :code:`n = tvm.convert(1024)` instead of :code:`n = tvm.var("n")`, # :code:`n = tvm.convert(1024)` instead of :code:`n = tvm.var("n")`,
...@@ -195,13 +197,13 @@ else: ...@@ -195,13 +197,13 @@ else:
# Save Compiled Module # Save Compiled Module
# -------------------- # --------------------
# Besides runtime compilation, we can save the compiled modules into # Besides runtime compilation, we can save the compiled modules into
# file and load them back later. This is called ahead of time compilation. # a file and load them back later. This is called ahead of time compilation.
# #
# The following code first does the following step: # The following code first performs the following steps:
# #
# - It saves the compiled host module into an object file. # - It saves the compiled host module into an object file.
# - Then it saves the device module into a ptx file. # - Then it saves the device module into a ptx file.
# - cc.create_shared calls a env compiler(gcc) to create a shared library # - cc.create_shared calls a compiler (gcc) to create a shared library
# #
from tvm.contrib import cc from tvm.contrib import cc
from tvm.contrib import util from tvm.contrib import util
...@@ -218,9 +220,9 @@ print(temp.listdir()) ...@@ -218,9 +220,9 @@ print(temp.listdir())
###################################################################### ######################################################################
# .. note:: Module Storage Format # .. note:: Module Storage Format
# #
# The CPU(host) module is directly saved as a shared library(so). # The CPU (host) module is directly saved as a shared library (.so).
# There can be multiple customized format on the device code. # There can be multiple customized formats of the device code.
# In our example, device code is stored in ptx, as well as a meta # In our example, the device code is stored in ptx, as well as a meta
# data json file. They can be loaded and linked separately via import. # data json file. They can be loaded and linked separately via import.
# #
...@@ -228,8 +230,8 @@ print(temp.listdir()) ...@@ -228,8 +230,8 @@ print(temp.listdir())
# Load Compiled Module # Load Compiled Module
# -------------------- # --------------------
# We can load the compiled module from the file system and run the code. # We can load the compiled module from the file system and run the code.
# The following code load the host and device module separately and # The following code loads the host and device module separately and
# re-link them together. We can verify that the newly loaded function works. # re-links them together. We can verify that the newly loaded function works.
# #
fadd1 = tvm.module.load(temp.relpath("myadd.so")) fadd1 = tvm.module.load(temp.relpath("myadd.so"))
if tgt == "cuda": if tgt == "cuda":
...@@ -261,11 +263,11 @@ tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy()) ...@@ -261,11 +263,11 @@ tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy())
# .. note:: Runtime API and Thread-Safety # .. note:: Runtime API and Thread-Safety
# #
# The compiled modules of TVM do not depend on the TVM compiler. # The compiled modules of TVM do not depend on the TVM compiler.
# Instead, it only depends on a minimum runtime library. # Instead, they only depend on a minimum runtime library.
# TVM runtime library wraps the device drivers and provides # The TVM runtime library wraps the device drivers and provides
# thread-safe and device agnostic call into the compiled functions. # thread-safe and device agnostic calls into the compiled functions.
# #
# This means you can call the compiled TVM function from any thread, # This means that you can call the compiled TVM functions from any thread,
# on any GPUs. # on any GPUs.
# #
...@@ -275,7 +277,7 @@ tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy()) ...@@ -275,7 +277,7 @@ tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + b.asnumpy())
# TVM provides code generation features into multiple backends, # TVM provides code generation features into multiple backends,
# we can also generate OpenCL code or LLVM code that runs on CPU backends. # we can also generate OpenCL code or LLVM code that runs on CPU backends.
# #
# The following codeblocks generate opencl code, creates array on opencl # The following code blocks generate OpenCL code, creates array on an OpenCL
# device, and verifies the correctness of the code. # device, and verifies the correctness of the code.
# #
if tgt.startswith('opencl'): if tgt.startswith('opencl'):
...@@ -296,12 +298,12 @@ if tgt.startswith('opencl'): ...@@ -296,12 +298,12 @@ if tgt.startswith('opencl'):
# This tutorial provides a walk through of TVM workflow using # This tutorial provides a walk through of TVM workflow using
# a vector add example. The general workflow is # a vector add example. The general workflow is
# #
# - Describe your computation via series of operations. # - Describe your computation via a series of operations.
# - Describe how we want to compute use schedule primitives. # - Describe how we want to compute use schedule primitives.
# - Compile to the target function we want. # - Compile to the target function we want.
# - Optionally, save the function to be loaded later. # - Optionally, save the function to be loaded later.
# #
# You are more than welcomed to checkout other examples and # You are more than welcome to checkout other examples and
# tutorials to learn more about the supported operations, schedule primitives # tutorials to learn more about the supported operations, scheduling primitives
# and other features in TVM. # and other features in TVM.
# #
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment