Commit e806cd15 by Thierry Moreau Committed by Tianqi Chen

[DOC, HARDWARE] Hardware developer guide, migrating to use Vivado 2018.2 (#1473)

parent efe2f6a2
VTA Configuration
=================
The VTA stack incorporates both a hardware accelerator stack and
a TVM based software stack.
VTA incorporates flexibility out of the box: by modifying the
``vta/config/vta_config.json`` high-level configuration file,
the user can change the shape of the tensor intrinsic,
clock frequency, pipelining, data type width, and on-chip buffer sizes.
Parameters Overview
-------------------
We explain the parameters listed in the ``vta_config.json`` file in the table
below.
+-----------------------+------------+--------------------------------------------------------+
| Attribute | Format | Description |
+=======================+============+========================================================+
| ``TARGET`` | String | The TVM device target. |
+-----------------------+------------+--------------------------------------------------------+
| ``HW_TARGET`` | Int | FPGA frequency in MHz. |
+-----------------------+------------+--------------------------------------------------------+
| ``HW_CLK_TARGET`` | Int | FPGA clock period in ns target for HLS tool. |
+-----------------------+------------+--------------------------------------------------------+
| ``HW_VER`` | String | VTA hardware version number. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_INP_WIDTH`` | Int (log2) | Input data type signed integer width. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_WGT_WIDTH`` | Int (log2) | Weight data type signed integer width. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_ACC_WIDTH`` | Int (log2) | Accumulator data type signed integer width. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_OUT_WIDTH`` | Int (log2) | Output data type signed integer width. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_BATCH`` | Int (log2) | VTA matrix multiply intrinsic output dimension 0. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_BLOCK_IN`` | Int (log2) | VTA matrix multiply reduction dimension. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_BLOCK_OUT`` | Int (log2) | VTA matrix multiply intrinsic output dimension 1. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_UOP_BUFF_SIZE`` | Int (log2) | Micro-op on-chip buffer in Bytes. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_INP_BUFF_SIZE`` | Int (log2) | Input on-chip buffer in Bytes. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_WGT_BUFF_SIZE`` | Int (log2) | Weight on-chip buffer in Bytes. |
+-----------------------+------------+--------------------------------------------------------+
| ``LOG_ACC_BUFF_SIZE`` | Int (log2) | Accumulator on-chip buffer in Bytes. |
+-----------------------+------------+--------------------------------------------------------+
.. note::
When a parameter name is preceded with ``LOG``, it means that it describes a value that can only be expressed a power of two.
For that reason we describe these parameters by their log2 value.
For instance, to describe an integer width of 8-bits for the input data types, we set the ``LOG_INP_WIDTH`` to be 3, which is the log2 of 8.
Similarly, to descibe a 64kB micro-op buffer, we would set ``LOG_UOP_BUFF_SIZE`` to be 16.
We provide additional detail below regarding each parameter:
- ``TARGET``: Can be set to ``"pynq"`` or ``"sim"``.
- ``HW_TARGET``: In pynq mode, can be set to ``100``, ``142``, ``167``, or ``200`` MHz.
- ``HW_CLK_TARGET``: The lower the target, the more pipeline stages HLS will insert to achieve timing closure during place and route (this can also slightly decrease performance).
- ``HW_VER``: Hardware version which increments everytime the VTA hardware design changes. This parameter is used to uniquely idenfity hardware bitstreams.
- ``LOG_OUT_WIDTH``: We recommend matching ``LOG_OUT_WIDTH`` to ``LOG_INP_WIDTH``.
- ``LOG_BATCH``: Equivalent to A in multiplication of shape (A, B) x (B, C), or typically, the batch dimension.
- ``LOG_BATCH``: Equivalent to A in multiplication of shape (A, B) x (B, C), or typically, the batch dimension.
- ``LOG_BLOCK_IN``: Equivalent to B in multiplication of shape (A, B) x (B, C), or typically, the input channel dimension.
- ``LOG_BLOCK_OUT``: Equivalent to C in multiplication of shape (A, B) x (B, C), or typically, the output channel dimension.
VTA Design and Developer Guide
==============================
This developer guide details the complete VTA-TVM hardware-software stack.
.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png
:align: center
:width: 60%
.. toctree::
:maxdepth: 2
config
hardware
\ No newline at end of file
VTA: Deep Learning Accelerator Stack VTA: Deep Learning Accelerator Stack
==================================== ====================================
Specialized accelerators are key enablers of future deep learning workloads. TVM stack targets specialized accelerators.
VTA(versatile tensor accelerator) is a generic, modular open-source deep learning accelerator. The Versatile Tensor Accelerator (VTA) is an open, generic, and customizable deep learning accelerator with a complete TVM-based compiler stack. We designed VTA to expose the most salient and common characteristics of mainstream deep learning accelerators. Together TVM and VTA form an end-to-end hardware-software deep learning system stack that includes hardware design, drivers, a JIT runtime, and an optimizing compiler stack based on TVM.
.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png
:align: center
:width: 60%
VTA has the following key features:
- Generic, modular, open-source hardware.
- Streamlined workflow to deploy to FPGAs.
- Simulator support to prototype compilation passes on regular workstations.
- Pynq-based driver and JIT runtime for both simulated and FPGA hardware back-end.
- End to end TVM stack integration.
This page contains links to all the resources related to VTA: This page contains links to all the resources related to VTA:
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
install install
dev/index
tutorials/index tutorials/index
Features Literature
-------- ----------
VTA have the following key features:
- Generic, modular open-source hardware - Read the VTA `release blog post`_.
- Streamlined workflow to deploy to FPGAs. - Read the VTA tech report: `An Open Hardware Software Stack for Deep Learning`_.
- Simulator support to protoype compilation passes on regular workstations.
- Driver and JIT runtime for both simulated and FPGA hardware backend. .. _release blog post: https://tvm.ai/2018/07/12/vta-release-announcement.html
- End to end TVM stack integration .. _An Open Hardware Software Stack for Deep Learning: https://arxiv.org/abs/1807.04188
\ No newline at end of file
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
# #
# Check if script is running in correct Vivado version. # Check if script is running in correct Vivado version.
set scripts_vivado_version 2017.1 set scripts_vivado_version 2018.2
set current_vivado_version [version -short] set current_vivado_version [version -short]
if { [string first $scripts_vivado_version $current_vivado_version] == -1 } { if { [string first $scripts_vivado_version $current_vivado_version] == -1 } {
...@@ -53,7 +53,8 @@ if { [llength $argv] eq 12 } { ...@@ -53,7 +53,8 @@ if { [llength $argv] eq 12 } {
} }
} else { } else {
puts "Arg list incomplete: <path to ip dir> <num threads> <clock freq> \ puts "Arg list incomplete: <path to ip dir> <num threads> <clock freq> \
<inp width> <wgt_width> <out_width> <batch> <in_block / 1024> <out_block>" <inp width> <wgt_width> <out_width> <batch> <batch> <out_block> <in_block
<inp_mem_size> <wgt_mem_size> <out_mem_size>"
return 1 return 1
} }
...@@ -66,6 +67,7 @@ if {[expr $inp_part == 0]} { ...@@ -66,6 +67,7 @@ if {[expr $inp_part == 0]} {
set inp_bus_width $inp_mem_width set inp_bus_width $inp_mem_width
} }
set inp_mem_depth [expr $inp_mem_size * 8 / ($inp_mem_width * $inp_part)] set inp_mem_depth [expr $inp_mem_size * 8 / ($inp_mem_width * $inp_part)]
# Derive weight mem parameters # Derive weight mem parameters
set wgt_mem_width [expr $wgt_width * $out_block * $in_block] set wgt_mem_width [expr $wgt_width * $out_block * $in_block]
set wgt_bus_width 1024 set wgt_bus_width 1024
...@@ -75,6 +77,7 @@ if {[expr $wgt_part == 0]} { ...@@ -75,6 +77,7 @@ if {[expr $wgt_part == 0]} {
set wgt_bus_width $wgt_mem_width set wgt_bus_width $wgt_mem_width
} }
set wgt_mem_depth [expr $wgt_mem_size * 8 / ($wgt_mem_width * $wgt_part)] set wgt_mem_depth [expr $wgt_mem_size * 8 / ($wgt_mem_width * $wgt_part)]
# Derive output mem parameters # Derive output mem parameters
set out_mem_width [expr $out_width * $batch * $out_block] set out_mem_width [expr $out_width * $batch * $out_block]
set out_bus_width 1024 set out_bus_width 1024
...@@ -252,7 +255,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt ...@@ -252,7 +255,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt
] $fetch_0 ] $fetch_0
# Create instance: g2l_queue, and set properties # Create instance: g2l_queue, and set properties
set g2l_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.1 g2l_queue ] set g2l_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.2 g2l_queue ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Empty_Threshold_Assert_Value_axis {1022} \ CONFIG.Empty_Threshold_Assert_Value_axis {1022} \
CONFIG.Empty_Threshold_Assert_Value_rach {14} \ CONFIG.Empty_Threshold_Assert_Value_rach {14} \
...@@ -273,7 +276,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt ...@@ -273,7 +276,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt
] $g2l_queue ] $g2l_queue
# Create instance: g2s_queue, and set properties # Create instance: g2s_queue, and set properties
set g2s_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.1 g2s_queue ] set g2s_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.2 g2s_queue ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Empty_Threshold_Assert_Value_axis {1022} \ CONFIG.Empty_Threshold_Assert_Value_axis {1022} \
CONFIG.Empty_Threshold_Assert_Value_rach {14} \ CONFIG.Empty_Threshold_Assert_Value_rach {14} \
...@@ -294,7 +297,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt ...@@ -294,7 +297,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt
] $g2s_queue ] $g2s_queue
# Create instance: gemm_queue, and set properties # Create instance: gemm_queue, and set properties
set gemm_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.1 gemm_queue ] set gemm_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.2 gemm_queue ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Empty_Threshold_Assert_Value_axis {510} \ CONFIG.Empty_Threshold_Assert_Value_axis {510} \
CONFIG.Empty_Threshold_Assert_Value_rach {14} \ CONFIG.Empty_Threshold_Assert_Value_rach {14} \
...@@ -318,7 +321,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt ...@@ -318,7 +321,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt
] $gemm_queue ] $gemm_queue
# Create instance: l2g_queue, and set properties # Create instance: l2g_queue, and set properties
set l2g_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.1 l2g_queue ] set l2g_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.2 l2g_queue ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Empty_Threshold_Assert_Value_axis {1022} \ CONFIG.Empty_Threshold_Assert_Value_axis {1022} \
CONFIG.Empty_Threshold_Assert_Value_rach {14} \ CONFIG.Empty_Threshold_Assert_Value_rach {14} \
...@@ -345,7 +348,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt ...@@ -345,7 +348,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt
] $load_0 ] $load_0
# Create instance: load_queue, and set properties # Create instance: load_queue, and set properties
set load_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.1 load_queue ] set load_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.2 load_queue ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Empty_Threshold_Assert_Value_axis {510} \ CONFIG.Empty_Threshold_Assert_Value_axis {510} \
CONFIG.Empty_Threshold_Assert_Value_rach {14} \ CONFIG.Empty_Threshold_Assert_Value_rach {14} \
...@@ -406,7 +409,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt ...@@ -406,7 +409,7 @@ proc create_root_design { parentCell clk inp_part wgt_part out_part inp_bus_widt
] $processing_system7_1 ] $processing_system7_1
# Create instance: s2g_queue, and set properties # Create instance: s2g_queue, and set properties
set s2g_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.1 s2g_queue ] set s2g_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.2 s2g_queue ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Empty_Threshold_Assert_Value_axis {1022} \ CONFIG.Empty_Threshold_Assert_Value_axis {1022} \
CONFIG.Empty_Threshold_Assert_Value_rach {14} \ CONFIG.Empty_Threshold_Assert_Value_rach {14} \
...@@ -433,7 +436,7 @@ CONFIG.C_M_AXI_DATA_PORT_CACHE_VALUE {"1111"} \ ...@@ -433,7 +436,7 @@ CONFIG.C_M_AXI_DATA_PORT_CACHE_VALUE {"1111"} \
] $store_0 ] $store_0
# Create instance: store_queue, and set properties # Create instance: store_queue, and set properties
set store_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.1 store_queue ] set store_queue [ create_bd_cell -type ip -vlnv xilinx.com:ip:fifo_generator:13.2 store_queue ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Empty_Threshold_Assert_Value_axis {510} \ CONFIG.Empty_Threshold_Assert_Value_axis {510} \
CONFIG.Empty_Threshold_Assert_Value_rach {14} \ CONFIG.Empty_Threshold_Assert_Value_rach {14} \
...@@ -466,7 +469,7 @@ CONFIG.NUM_PORTS {5} \ ...@@ -466,7 +469,7 @@ CONFIG.NUM_PORTS {5} \
if {${inp_part} > 1} { if {${inp_part} > 1} {
for {set i 0} {$i < ${inp_part}} {incr i} { for {set i 0} {$i < ${inp_part}} {incr i} {
# Create instance: inp_mem, and set properties # Create instance: inp_mem, and set properties
set inp_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.3 inp_mem_${i} ] set inp_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.4 inp_mem_${i} ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Byte_Size {8} \ CONFIG.Byte_Size {8} \
CONFIG.Enable_32bit_Address {true} \ CONFIG.Enable_32bit_Address {true} \
...@@ -494,7 +497,7 @@ CONFIG.NUM_PORTS {5} \ ...@@ -494,7 +497,7 @@ CONFIG.NUM_PORTS {5} \
} }
} else { } else {
# Create instance: inp_mem, and set properties # Create instance: inp_mem, and set properties
set inp_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.3 inp_mem ] set inp_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.4 inp_mem ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Byte_Size {8} \ CONFIG.Byte_Size {8} \
CONFIG.Enable_32bit_Address {true} \ CONFIG.Enable_32bit_Address {true} \
...@@ -525,7 +528,7 @@ CONFIG.NUM_PORTS {5} \ ...@@ -525,7 +528,7 @@ CONFIG.NUM_PORTS {5} \
if {${wgt_part} > 1} { if {${wgt_part} > 1} {
for {set i 0} {$i < ${wgt_part}} {incr i} { for {set i 0} {$i < ${wgt_part}} {incr i} {
# Create instance: wgt_mem, and set properties # Create instance: wgt_mem, and set properties
set wgt_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.3 wgt_mem_${i} ] set wgt_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.4 wgt_mem_${i} ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Assume_Synchronous_Clk {true} \ CONFIG.Assume_Synchronous_Clk {true} \
CONFIG.Byte_Size {8} \ CONFIG.Byte_Size {8} \
...@@ -553,7 +556,7 @@ CONFIG.NUM_PORTS {5} \ ...@@ -553,7 +556,7 @@ CONFIG.NUM_PORTS {5} \
} }
} else { } else {
# Create instance: wgt_mem, and set properties # Create instance: wgt_mem, and set properties
set wgt_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.3 wgt_mem ] set wgt_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.4 wgt_mem ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Assume_Synchronous_Clk {true} \ CONFIG.Assume_Synchronous_Clk {true} \
CONFIG.Byte_Size {8} \ CONFIG.Byte_Size {8} \
...@@ -584,7 +587,7 @@ CONFIG.NUM_PORTS {5} \ ...@@ -584,7 +587,7 @@ CONFIG.NUM_PORTS {5} \
if {${out_part} > 1} { if {${out_part} > 1} {
for {set i 0} {$i < ${out_part}} {incr i} { for {set i 0} {$i < ${out_part}} {incr i} {
# Create instance: out_mem, and set properties # Create instance: out_mem, and set properties
set out_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.3 out_mem_${i} ] set out_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.4 out_mem_${i} ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Byte_Size {8} \ CONFIG.Byte_Size {8} \
CONFIG.Enable_32bit_Address {true} \ CONFIG.Enable_32bit_Address {true} \
...@@ -612,7 +615,7 @@ CONFIG.NUM_PORTS {5} \ ...@@ -612,7 +615,7 @@ CONFIG.NUM_PORTS {5} \
} }
} else { } else {
# Create instance: out_mem, and set properties # Create instance: out_mem, and set properties
set out_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.3 out_mem ] set out_mem [ create_bd_cell -type ip -vlnv xilinx.com:ip:blk_mem_gen:8.4 out_mem ]
set_property -dict [ list \ set_property -dict [ list \
CONFIG.Byte_Size {8} \ CONFIG.Byte_Size {8} \
CONFIG.Enable_32bit_Address {true} \ CONFIG.Enable_32bit_Address {true} \
......
...@@ -30,7 +30,7 @@ from tvm import rpc ...@@ -30,7 +30,7 @@ from tvm import rpc
from tvm.contrib import util from tvm.contrib import util
from vta.testing import simulator from vta.testing import simulator
# Load VTA parameters from the config.json file # Load VTA parameters from the vta/config/vta_config.json file
env = vta.get_env() env = vta.get_env()
# We read the Pynq RPC host IP address and port number from the OS environment # We read the Pynq RPC host IP address and port number from the OS environment
...@@ -38,7 +38,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99") ...@@ -38,7 +38,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99")
port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091")) port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091"))
# We configure both the bitstream and the runtime system on the Pynq # We configure both the bitstream and the runtime system on the Pynq
# to match the VTA configuration specified by the config.json file. # to match the VTA configuration specified by the vta_config.json file.
if env.TARGET == "pynq": if env.TARGET == "pynq":
# Make sure that TVM was compiled with RPC=1 # Make sure that TVM was compiled with RPC=1
......
...@@ -26,7 +26,7 @@ from tvm import rpc ...@@ -26,7 +26,7 @@ from tvm import rpc
from tvm.contrib import util from tvm.contrib import util
from vta.testing import simulator from vta.testing import simulator
# Load VTA parameters from the config.json file # Load VTA parameters from the vta/config/vta_config.json file
env = vta.get_env() env = vta.get_env()
# We read the Pynq RPC host IP address and port number from the OS environment # We read the Pynq RPC host IP address and port number from the OS environment
...@@ -34,7 +34,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99") ...@@ -34,7 +34,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99")
port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091")) port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091"))
# We configure both the bitstream and the runtime system on the Pynq # We configure both the bitstream and the runtime system on the Pynq
# to match the VTA configuration specified by the config.json file. # to match the VTA configuration specified by the vta_config.json file.
if env.TARGET == "pynq": if env.TARGET == "pynq":
# Make sure that TVM was compiled with RPC=1 # Make sure that TVM was compiled with RPC=1
...@@ -95,7 +95,7 @@ elif env.TARGET == "sim": ...@@ -95,7 +95,7 @@ elif env.TARGET == "sim":
# :width: 480px # :width: 480px
# #
# The dimensions of that matrix-matrix multiplication are specified in # The dimensions of that matrix-matrix multiplication are specified in
# the :code:`config.json` configuration file. # the :code:`vta_config.json` configuration file.
# The activation matrix has a :code:`(BATCH, BLOCK_IN)` shape # The activation matrix has a :code:`(BATCH, BLOCK_IN)` shape
# and the transposed weight matrix has a :code:`(BLOCK_OUT, BLOCK_IN)` shape, # and the transposed weight matrix has a :code:`(BLOCK_OUT, BLOCK_IN)` shape,
# thus inferring that the resulting output matrix has a # thus inferring that the resulting output matrix has a
...@@ -131,7 +131,7 @@ elif env.TARGET == "sim": ...@@ -131,7 +131,7 @@ elif env.TARGET == "sim":
# dimension of VTA's tensor core, but also to match the specific data types # dimension of VTA's tensor core, but also to match the specific data types
# expected by VTA. # expected by VTA.
# VTA for now only supports fixed point data types, which integer width is # VTA for now only supports fixed point data types, which integer width is
# specified in the :code:`config.json` file by :code:`INP_WIDTH` and # specified in the :code:`vta_config.json` file by :code:`INP_WIDTH` and
# :code:`WGT_WIDTH` for the activations and weights data types respectively. # :code:`WGT_WIDTH` for the activations and weights data types respectively.
# In addition, the accumulator data type integer width is specified by # In addition, the accumulator data type integer width is specified by
# :code:`ACC_WIDTH`. # :code:`ACC_WIDTH`.
...@@ -284,7 +284,7 @@ print(tvm.lower(s, [A, B, C], simple_mode=True)) ...@@ -284,7 +284,7 @@ print(tvm.lower(s, [A, B, C], simple_mode=True))
# that stores input matrices of shape :code:`(env.BATCH, env.BLOCK_IN)` # that stores input matrices of shape :code:`(env.BATCH, env.BLOCK_IN)`
# of type :code:`env.inp_dtype`. The input buffer contains # of type :code:`env.inp_dtype`. The input buffer contains
# `2 ^ LOG_INP_BUFF_SIZE` matrix elements (as specified in the # `2 ^ LOG_INP_BUFF_SIZE` matrix elements (as specified in the
# :code:`config.json` file). # :code:`vta_config.json` file).
# - :code:`env.wgt_scope`: Weight buffer, which is a read-only SRAM buffer # - :code:`env.wgt_scope`: Weight buffer, which is a read-only SRAM buffer
# that stores weight matrices of shape :code:`(env.BLOCK_OUT, env.BLOCK_IN)` # that stores weight matrices of shape :code:`(env.BLOCK_OUT, env.BLOCK_IN)`
# of type :code:`env.wgt_dtype`. The weight buffer contains # of type :code:`env.wgt_dtype`. The weight buffer contains
......
...@@ -29,7 +29,7 @@ from tvm import rpc ...@@ -29,7 +29,7 @@ from tvm import rpc
from tvm.contrib import util from tvm.contrib import util
from vta.testing import simulator from vta.testing import simulator
# Load VTA parameters from the config.json file # Load VTA parameters from the vta/config/vta_config.json file
env = vta.get_env() env = vta.get_env()
# We read the Pynq RPC host IP address and port number from the OS environment # We read the Pynq RPC host IP address and port number from the OS environment
...@@ -37,7 +37,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99") ...@@ -37,7 +37,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99")
port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091")) port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091"))
# We configure both the bitstream and the runtime system on the Pynq # We configure both the bitstream and the runtime system on the Pynq
# to match the VTA configuration specified by the config.json file. # to match the VTA configuration specified by the vta_config.json file.
if env.TARGET == "pynq": if env.TARGET == "pynq":
# Make sure that TVM was compiled with RPC=1 # Make sure that TVM was compiled with RPC=1
......
...@@ -38,7 +38,7 @@ from io import BytesIO ...@@ -38,7 +38,7 @@ from io import BytesIO
from matplotlib import pyplot as plt from matplotlib import pyplot as plt
from PIL import Image from PIL import Image
# Load VTA parameters from the config.json file # Load VTA parameters from the vta/config/vta_config.json file
env = vta.get_env() env = vta.get_env()
# Helper to crop an image to a square (224, 224) # Helper to crop an image to a square (224, 224)
...@@ -180,7 +180,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99") ...@@ -180,7 +180,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99")
port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091")) port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091"))
# We configure both the bitstream and the runtime system on the Pynq # We configure both the bitstream and the runtime system on the Pynq
# to match the VTA configuration specified by the config.json file. # to match the VTA configuration specified by the vta_config.json file.
if env.TARGET == "pynq": if env.TARGET == "pynq":
# Make sure that TVM was compiled with RPC=1 # Make sure that TVM was compiled with RPC=1
......
...@@ -29,12 +29,12 @@ import numpy as np ...@@ -29,12 +29,12 @@ import numpy as np
# VTA is a modular and customizable design. Consequently, the user # VTA is a modular and customizable design. Consequently, the user
# is free to modify high-level hardware parameters that affect # is free to modify high-level hardware parameters that affect
# the hardware design layout. # the hardware design layout.
# These parameters are specified in the :code:`config.json` file by their # These parameters are specified in the :code:`vta_config.json` file by their
# :code:`log2` values. # :code:`log2` values.
# These VTA parameters can be loaded with the :code:`vta.get_env` # These VTA parameters can be loaded with the :code:`vta.get_env`
# function. # function.
# #
# Finally, the TVM target is specified in the :code:`config.json` file. # Finally, the TVM target is also specified in the :code:`vta_config.json` file.
# When set to *sim*, execution will take place inside of a behavioral # When set to *sim*, execution will take place inside of a behavioral
# VTA simulator. # VTA simulator.
# If you want to run this tutorial on the Pynq FPGA development platform, # If you want to run this tutorial on the Pynq FPGA development platform,
...@@ -58,7 +58,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99") ...@@ -58,7 +58,7 @@ host = os.environ.get("VTA_PYNQ_RPC_HOST", "192.168.2.99")
port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091")) port = int(os.environ.get("VTA_PYNQ_RPC_PORT", "9091"))
# We configure both the bitstream and the runtime system on the Pynq # We configure both the bitstream and the runtime system on the Pynq
# to match the VTA configuration specified by the config.json file. # to match the VTA configuration specified by the vta_config.json file.
if env.TARGET == "pynq": if env.TARGET == "pynq":
# Make sure that TVM was compiled with RPC=1 # Make sure that TVM was compiled with RPC=1
...@@ -110,11 +110,11 @@ elif env.TARGET == "sim": ...@@ -110,11 +110,11 @@ elif env.TARGET == "sim":
# For VTA's general purpose operations such as vector adds, the tile size is # For VTA's general purpose operations such as vector adds, the tile size is
# :code:`(env.BATCH, env.BLOCK_OUT)`. # :code:`(env.BATCH, env.BLOCK_OUT)`.
# The dimensions are specified in # The dimensions are specified in
# the :code:`config.json` configuration file and are set by default to # the :code:`vta_config.json` configuration file and are set by default to
# a (1, 16) vector. # a (1, 16) vector.
# #
# In addition, A and B's data types also needs to match the :code:`env.acc_dtype` # In addition, A and B's data types also needs to match the :code:`env.acc_dtype`
# which is set by the :code:`config.json` file to be a 32-bit integer. # which is set by the :code:`vta_config.json` file to be a 32-bit integer.
# Output channel factor m - total 64 x 16 = 1024 output channels # Output channel factor m - total 64 x 16 = 1024 output channels
m = 64 m = 64
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment