1. 12 Mar, 2020 1 commit
  2. 09 Mar, 2020 1 commit
    • [VTA][Chisel,de10nano] Chisel fixes and de10nano support (#4986) · 5b4cf5df
      * [VTA][de10nano] Enable user defined target frequency.
      
      Issue:
      The VTA target frequency on the DE10-Nano is hardcoded to 50MHz
      unnecessarily limiting performance.
      
      Solution:
      Add a PLL to the FPGA sub-system along with support for the
      selection of a user specified frequency at build time. The board
      successfully builds and runs at 100MHz.
      
      * Added a PLL in the soc_system.tcl platform designer generator
        script.
      
      * Modified the Makefile to automatically set the target frequency
        from that specified in the pkg_config.py file.
      
      * Modified the Makefile to generate a bitstream with an RBF
        format that enables programming of the FPGA directly from
        the on-board processor. Specifically, the RBF is generated in
        FastParallel32 mode with compression, which corresponds to the
        default MSEL switch setting on the board, i.e. 01010.
      
      * Added a false path override to file set_clocks.sdc to turn off
        unconstrained path warnings on the VTA pulse LED.
      
      * [VTA][TSIM] Add more debug and tracing options.
      
      * Modified Makefile to change default config to DafaultDe10Config.
      
      * Added option in Makefile to produce more detailed tracing
        for extra observability in debugging complex scenarios.
      
      * Added option in Makefile to produce traces in FST format which
        are 2 orders of magnitude smaller, although much slower to
        generate.
      
      * Added option in Makefile to build the simulator with GCC address
        sanitizer.
      
      * Modified Makefile to not lint the scala code by default avoiding
        unintended wrong indentation. Linting should be better performed
        manually on a per-need basis.
      
      * [VTA][de10nano] Enable remote programming of FPGA.
      
      Issue:
      The Cyclone V FPGA on board of the DE10-Nano can only be programmed
      using the JTAG port, which is a limiting option for users.
      
      Solution:
      Add support for the remote programming of the FPGA implementing
      the FPGA programming manager protocol published in the Cyclone V
      user manual.
      
      * Added file de10nano_mgr.h implementing an FPGA manager class
        that supports handling of control and status registers as well
        as a push-button option to program the FPGA. The class can be
        easily extended to include more registers if needed.
      
      * Used an instance of the FPGA manager to implement function
        VTAProgram also warning users when incompatible bitstream
        files are used.
      
      * Registered VTAProgram as a global function and modified
        the program_bitstream python class to use it.
      
      * [VTA][de10nano] Enhance de10nano runtime support.
      
      Issue:
      The de10nano target has incomplete, non-working support
      for runtime reconfiguration, bitstream programming, and
      examples of usage.
      
      Solution:
      Complete runtime support for the de10nano target.
      
      * Modified VTA.cmake to comment out a default override for
        VTA_MAX_XFER to 21 bit wide.
      
      * Modified VTA.cmake to add needed de10nano include dirs.
      
      * Modified relevant files to support de10nano same way as
        other targets for VTA runtime reconfiguration and FPGA
        programming.
      
      * Added test_program_rpc.py example as a runtime FPGA
        programming example. Note that unlike the pynq target
        no bitstream is either downloaded or programmed when
        the bitstream argument is set to None.
      
      * Cosmetic changes to vta config files.
      
      * [VTA][Chisel] LoadUop FSM bug fix.
      
      Issue:
      The LoadUop FSM incorrectly advances the address of the next
      uop to read from DRAM when the DRAM data valid bit is deasserted
      and asserted at the end of a read. This is caused by a mismatch
      in the logic of the state and output portions of the FSM.
      This is one of two issues that was gating the correct operation
      of VTA on the DE10-Nano target.
      
      Solution:
      Modify the logic of the output section of the FSM to include
      a check on the DRAM read valid bit or fold the output assignemnt
      into the state section.
      
      * Folded the assignemnt of the next uop address in the state
        section of the FSM.
      
      * [VTA][Chisel] Dynamically adjust DMA tranfer size.
      
      Issue:
      In the DE10-Nano target and possibly in others, DMA transfers that
      cross the boundaries of memory pages result in incorrect reads and
      writes from and to DRAM. When this happens depending on different
      input values, VTA loads and stores exhibit incorrect results for
      DMA pulses at the end of a transfer. This is one of two issues that
      were gating the DE10-Nano target from functioning correctly, but may
      affect other Chisel based targets.
      
      Solution:
      Add support for dynamically adjustble DMA transfer sizes in load
      and store operations. For a more elegant and modular implementation
      the feature can be enabled at compile time with a static constant
      that can be passed as a configuration option.
      
      * Modified the load and store finite state machines to dynamically
        adjust the size of initial and stride DMA transfers. The feature
        is enabled by default by virtue of the static constant
        ADAPTIVE_DMA_XFER_ENABLE.
      
      * [VTA][Chisel] Improve FSIM/TSIM/FPGA xref debug.
      
      Issue:
      Cross reference between FSIM, TSIM, and Chisel based FPGA traces
      is an invaluable instrument that enables fast analysis on FSIM,
      and analysis/debug on TSIM and FPGA, especially for complex flows
      like conv2d or full inferences. Currently this cannot be done
      easily since a suitable reference is missing. The clock cycle
      event counter cannot be used since it is undefined in FSIM and
      not reliable between TSIM and FPGA because of different latencies.
      
      Solution:
      Introduce a new event counter that preserves a program order across
      FSIM, TSIM, FPGA. We propose adding the accumulator write event
      counter in the Chisel EventCounter class and a simple instrumentation
      in the FSIM runtime code. Note that this technique enabled finding the
      Chisel issues reportes in the PR, which would have been otherwise
      far more difficult.
      
      * Added the acc_wr_count event counter and changed interfaces
        accordingly.
      
      * [VTA][de10nano] Comply with linting rules.
      
      * [VTA] Appease make lint.
      
      * [VTA] Disable pylint import not top level error.
      
      * [VTA][Chisel,de10nano] Linting changes.
      
      * Use CamelCase class names.
      
      * Use C++ style C include header files.
      
      * Add comments to Chisel makefile.
      
      * [VTA][de10nano]
      
      * Reorder C and C++ includes in de10nano_mgr.h.
      
      * Restore lint as default target in Chisel Makefile.
      
      * [VTA][de10nano] Do not use f string in pkg_config.py.
      
      * [VTA][de10nano] Remove overlooked f strings in pkg_config.py.
      
      * [VTA][de10nano] Fixed typo.
      
      * [VTA][TSIM] Check if gcc has align-new.
      
      * [VTA][Chisel] Make adaptive DMA transfer default.
      
      * [VTA][RPC] Renamed VTA_PYNQ_RPC_* to VTA_RPC_*.
      
      Issue:
      With more FPGA targets coming online the initial method of
      using individual environment variables to specify target IP and port
      does not scale well.
      
      Solution:
      Use a single VTA_RPC_HOST, VTA_RPC_PORT pair to be changed
      every time a different target is used. For instance in a script
      used to benchmark all targets.
      
      * Replaced every instance of VTA_PYNQ_RPC_HOST and VTA_PYNQ_RPC_PORT
        with VTA_RPC_HOST and VTA_RPC_PORT, respectively.
      
      * [VTA][Chisel] Comply with new linter.
      Pasquale Cocchini committed
  3. 27 Feb, 2020 2 commits
    • [DOCS] Sphinx -- Introduce alias detection. (#4954) · 1dbdcfb5
      * [DOCS] Sphinx -- Introduce alias detection.
      
      Background: some of our namespaces import function from another
      namespace. For example tvm.te imports most of the operators from tvm.tir.
      
      Previously we manually exclude these aliases from the doc.
      However that means we can not link them by the alias name.
      
      This PR adds a sphinx callback plugin to detect such aliases, and create a rubric block
      on the button of its current docstring `Alias of the original class`.
      It is done in a way so that we can refer to the generated docs.
      
      We also fixed a few docs errors.
      
      * Fix most of the issues
      Tianqi Chen committed
    • [REFACTOR][PY][API-CHANGE] Remove legacy python files. (#4943) · 9816efc2
      * [REFACTOR][PY][API-CHANGE] Remove legacy python files.
      
      Remove legacy python files.
      Use the te namespace for most of the tensor expression primitives.
      
      - tvm.create_schedule -> tvm.te.create_schedule
      - tvm.placeholder -> tvm.te.placeholder
      - tvm.compute -> tvm.te.compute
      
      * Remove top-level exposures.
      Tianqi Chen committed
  4. 26 Feb, 2020 1 commit
    • [VTA] YoloV3 Support (#4887) · 09c55fd1
      * [VTA] YoloV3 Support
      
      Issue:
      YoloV3 use some operator and logic that not get good support by
      existing vta logic, like nn.pad, upsample, and 255 output channel.
      
      Solution:
      add related logic to let darknet YoloV3 can running on VTA
      
      * Fix small(0, or 1 heigh/width) detect frame issue.
      
      * add yolov3-tiny turtorial
      
      * add os import
      
      * address review comments.
      
      * rename tutorial file with a short name.
      
      * rename deploy_vision_on_vta.py into deploy_classification.py.
      
      * address review comment, fix plint eror in deploy_detection.py
      Hua Jiang committed
  5. 20 Feb, 2020 1 commit
  6. 07 Feb, 2020 1 commit
    • [REFACTOR][PY][API-Change] Polish tvm.runtime, tvm.runtime.module API update (#4837) · e0122c0e
      * [REFACTOR][PY-API] Polish tvm.runtime, tvm.runtime.module API update
      
      This PR updates the tvm.runtime to use the new FFI style.
      
      - Remove top-level tvm.module to avoid confusion between runtime.Module and IRModule
      - API changes wrt to runtime.Module
        - tvm.module.load -> tvm.runtime.load_module
        - tvm.module.enabled -> tvm.runtime.enabled
        - tvm.module.system_lib -> tvm.runtime.system_lib
      - Remove dep on api_internal from runtime.
      
      * Update module.load in the latest API
      Tianqi Chen committed
  7. 17 Jan, 2020 1 commit
    • [VTA][TSIM] Enable TSIM CI Testing (#4407) · 2738eddf
      * Update task_python_vta.sh
      
      * install sbt=1.1.1 with apt-get
      
      * update verilator_opt
      
      * install verilator with major version 4.0
      
      * disable multi-threading for now
      
      * bug fix for correcting uop fetch address in LoadUop module
      
      * bug fix for correcting uop fetch address in LoadUop module
      
      * adjustment to read from dram_offset
      
      * enable USE_THREADS with verilator 4.x
      
      * DEBUG: try avoid core dump with verilator 4.x
      
      * bug fix in LoadUop module
      
      * log mega cycles in tsim
      
      * download cat.png to avoid fetching in each run
      
      * bug fix in LoadUop module
      
      * solve dram_even/sram_even issue
      
      * bug fix
      
      * introduce scalalint in ci
      
      * speedup tsim in ci
      
      * bug fix
      
      * lint scala code before building
      
      * disable multi-threading
      
      * split fsim/tsim script
      
      * update Jenkins settings
      
      * duplicate task_python_vta_fsim.sh as task_python_vta.sh for now
      
      Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
      Liangfu Chen committed
  8. 06 Jan, 2020 1 commit
  9. 27 Nov, 2019 1 commit
  10. 05 Sep, 2019 1 commit
    • [VTA][Relay] Extending Vision model coverage compilation for VTA (#3740) · 028f47ce
      * adding support for graphpack over multiply op
      
      * increasing resnet model coverage
      
      * fix indentation
      
      * lint
      
      * moving recursion limit fix into graphpack pass
      
      * moving recursionlimit to relay init
      
      * pooling on NCHWnc format
      
      * adding more models
      
      * deploy_resnet_on_vta.py
      
      * trailing line
      
      * generalizing to vision models
      
      * merge conflicts
      
      * fix, apply quantization to VTA only
      
      * improving comments
      
      * trimming models that have runtime issues for the moment
      
      * lint
      
      * lint
      
      * lint
      Thierry Moreau committed
  11. 30 Jul, 2019 1 commit
    • [VTA] Support for batched inference (#3661) · 6c7f0c4d
      * fix in IR pass to support padding on 6-d tensors
      
      * support for both N>1 and N==1 for padding
      
      * batch size > 1 tuning and base config
      
      * output formatting
      
      * batch conv2d
      
      * print all category results
      
      * revert to single-batch config
      
      * pick record best
      
      * fix conv test
      
      * improving reporting
      
      * address batching bug in fast simulator
      
      * fix
      Thierry Moreau committed
  12. 29 Jul, 2019 1 commit
    • [VTA] Refactor to increase platform coverage (Ultra96 etc.) (#3496) · f55609b4
      * hardware refactor for increased FPGA coverage, small optimizations
      
      * fix header
      
      * cleaning up parameters that won't be needed for now
      
      * streamlining makefile, and simplifying tcl scripts
      
      * moving parameter derivation into pkg_config.py, keeping tcl scripts lightweight
      
      * refactoring tcl script to avoid global variables
      
      * deriving AXI signals in pkg_config.py
      
      * unifying address map definition for hardware and software drivers
      
      * single channel design for ultra96 to simplify build
      
      * enable alu by default, no mul opcode for now
      
      * hardware fix
      
      * new bitstream; vta version
      
      * avoid error when env variable is not set
      
      * ultra96 cleanup
      
      * further cleaning up tcl script for bitstream generation
      
      * preliminary rpc server support on ultra96
      
      * rpc server tracker scripts
      
      * ultra96 ldflag
      
      * ultra96 support
      
      * ultra96 support
      
      * cleanup line
      
      * cmake support for ultra96
      
      * simplify memory instantiation
      
      * cleaning up IP parameter initialization
      
      * fix queue instantiation
      
      * 2019.1 transition
      
      * fix macro def
      
      * removing bus width from config
      
      * cleanup
      
      * fix
      
      * turning off testing for now
      
      * cleanup ultra96 ps insantiation
      
      * minor refactor
      
      * adding comments
      
      * upgrading to tophub v0.6
      
      * model used in TVM target now refers to a specific version of VTA for better autoTVM scheduling
      
      * revert change due to bug
      
      * rename driver files to be for zynq-type devices
      
      * streamlining address mapping
      
      * unifying register map offset values between driver and hardware generator
      
      * rely on cma library for cache flush/invalidation
      
      * coherence management
      
      * not make buffer packing depend on data types that can be wider than 64bits
      
      * refactor config derivation to minimize free parameters
      
      * fix environment/pkg config interaction
      
      * adding cfg dump property to pkgconfig:
      
      * fix rpc reconfig
      
      * fix spacing
      
      * cleanup
      
      * fix spacing
      
      * long line fix
      
      * fix spacing and lint
      
      * fix line length
      
      * cmake fix
      
      * environment fix
      
      * renaming after pynq since the driver stack relies on the pynq library - see pynq.io
      
      * update doc
      
      * adding parameterization to  name
      
      * space
      
      * removing reg width
      
      * vta RPC
      
      * update doc on how to edit vta_config.json
      
      * fix path
      
      * fix path
      Thierry Moreau committed
  13. 19 Jul, 2019 1 commit
  14. 08 Jul, 2019 1 commit
    • [VTA] TSIM improvements and fixes (#3505) · a31dd162
      * add tsim init function
      
      * add sim device
      
      * test wait and resume
      
      * launch simulation thread from DPILoader
      
      * add VTASimDPI module to handle all simulation related stuff
      
      * test tsim init
      
      * move exit to simdpi module
      
      * update vta driver
      
      * add chisel DPI module
      
      * get back simshell
      
      * update vta to support dpi sim
      
      * update unittests
      
      * add tsim to integration-conv2d test
      
      * run resnet on tsim
      
      * remove max-cycles
      
      * match tsim counters with sim counters
      
      * use env in simulator to switch between sim and tsim
      
      * update unittest
      
      * rollback conv2d test
      
      * update resnet
      
      * add stats to matrix multiply
      
      * add stats
      
      * print stats after assert
      
      * update other tests
      
      * add stats to gemm
      
      * add return and remove unused libs
      
      * add missing arg
      
      * return lib
      
      * update comments for linter
      
      * add more comments to VTASimDPI module
      
      * remove trailing spaces
      
      * remove trailing spaces
      Luis Vega committed
  15. 06 Jul, 2019 1 commit
  16. 03 Jul, 2019 1 commit
  17. 02 Jul, 2019 1 commit
  18. 28 Jun, 2019 1 commit