- 10 Apr, 2020 1 commit
-
-
* [RUNTIME] Initial implementation of Hexagon runtime support This is only the TVM runtime. The FastRPC libraries, simulator driver, etc. will be provided in subsequent commits. * Fix pylint complaints * Fix some more pylint complaints * Add link to the Hexagon SDK website * Extract VTCM marker into a common variable * Implement device->device memory copy * Disable unsigned PDs by default * Ensure that --hvx_length is present in sim_args if HVX is enabled * Remove the line about clang from README.md Apparently things work with libstdc++. * Mention to set USE_RPC=OFF when building libtvm_runtime.so for Hexagon * Remember to use codegen_hvx in validate_hvx_length * Add a line about minimum version of LLVM
Krzysztof Parzyszek committed
-
- 07 Apr, 2020 1 commit
-
-
* Add implementation of TVMDSOOp * feat: Update cmake script to work with c++11 and in-repo build * feat: Use libtvm as oplib dependency * fix: Add missing link dependency to libtvm * feat: Update tf tvmdso op by review comments * fix: Update with pr comments * fix: Fix lint * feat: Add test script and fix gpu shape * feat: Add test script and fix gpu shape * fix: Conditional build tftvm op for gpu * fix: Conditional build tftvm op for gpu * fix: Fix pylint of tf_op module.py * fix: Fix pylint of tf_op module.py * feat: Conditional enable gpu test for tftvm op * feat: Conditional enable gpu test for tftvm op * feat: Add tf_tvmdsoop test script as an app test * fix: Fix gpu/cpu enabled check on tvm in test script * fix: Make tf tvmdso op test script runnable with pytest * remove unused test script test_tfop_module.py * fix: Remove pushd & popd in tfdsoop test script * fix: Upgrade tftvmop use python3 to find TensorFlow * fix: Upgrade tftvmop use python3 to find TensorFlow * fix: Change target_link_options to target_link_libraries * fix: Add tftvmop build script's c++ option * fix: Add tvm library path to tf op test library path * fix: Debug ci build for tftvm dso op * fix: Fix cmake error and skip tfop test * fix: Fix typo and indentation issues * feat: Use TF list input op def * fix: Fix style and unexpected changes Co-authored-by: baoxinqi <baoxinqi@4paradigm.com> Co-authored-by: Chen Dihao <chendihao@4paradigm.com> Co-authored-by: wrongtest <wrongtest@4paradigm.com>
tobe committed
-
- 31 Mar, 2020 1 commit
-
-
* refactor * path udpate
Thierry Moreau committed
-
- 30 Mar, 2020 1 commit
-
-
* [CI] Improve VTA build message and scripts. * Use absolute path to set the env var
Tianqi Chen committed
-
- 29 Mar, 2020 2 commits
-
-
Tianqi Chen committed
-
Thierry Moreau committed
-
- 23 Mar, 2020 1 commit
-
-
* add argsort_nms_thrust * consider valid count in thrust nms sort * make thrust optional * typo * typo * fix pylint * address some of the comments * address more comments * fix lint * address more comments * address more comments
Leyuan Wang committed
-
- 20 Mar, 2020 1 commit
-
-
* [TOPI][OP] Use Thrust sort for argsort and topk The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster. Note that this change requires CMake 3.8 or later since we have to use nvcc to compile a thrust code. * cmake: make CUDA optional * allow .cu file to be into the repository * pylint fix and cleanup * require cmake 3.8 only when thrust is enabled * fix nvcc compiler error when passing -pthread * add missing include * add USE_THRUST option in config.cmake * retrigger CI * retrigger CI
MORITA Kazutaka committed
-
- 12 Mar, 2020 1 commit
-
-
Thierry Moreau committed
-
- 10 Mar, 2020 1 commit
-
-
* Add Nick's changes's squashed * Fix frontend compilation * Re-enable Rust CI * Add changes with conflicted badly * Restructure import_module! macro in order to avoid unstable features * Kill old unstable feature enablement * Refactor common to use new APIs * Move the code to stable * Fix warning Co-authored-by: Nick Hynes <nhynes@oasislabs.com>
Jared Roesch committed
-
- 09 Mar, 2020 1 commit
-
-
* [VTA][de10nano] Enable user defined target frequency. Issue: The VTA target frequency on the DE10-Nano is hardcoded to 50MHz unnecessarily limiting performance. Solution: Add a PLL to the FPGA sub-system along with support for the selection of a user specified frequency at build time. The board successfully builds and runs at 100MHz. * Added a PLL in the soc_system.tcl platform designer generator script. * Modified the Makefile to automatically set the target frequency from that specified in the pkg_config.py file. * Modified the Makefile to generate a bitstream with an RBF format that enables programming of the FPGA directly from the on-board processor. Specifically, the RBF is generated in FastParallel32 mode with compression, which corresponds to the default MSEL switch setting on the board, i.e. 01010. * Added a false path override to file set_clocks.sdc to turn off unconstrained path warnings on the VTA pulse LED. * [VTA][TSIM] Add more debug and tracing options. * Modified Makefile to change default config to DafaultDe10Config. * Added option in Makefile to produce more detailed tracing for extra observability in debugging complex scenarios. * Added option in Makefile to produce traces in FST format which are 2 orders of magnitude smaller, although much slower to generate. * Added option in Makefile to build the simulator with GCC address sanitizer. * Modified Makefile to not lint the scala code by default avoiding unintended wrong indentation. Linting should be better performed manually on a per-need basis. * [VTA][de10nano] Enable remote programming of FPGA. Issue: The Cyclone V FPGA on board of the DE10-Nano can only be programmed using the JTAG port, which is a limiting option for users. Solution: Add support for the remote programming of the FPGA implementing the FPGA programming manager protocol published in the Cyclone V user manual. * Added file de10nano_mgr.h implementing an FPGA manager class that supports handling of control and status registers as well as a push-button option to program the FPGA. The class can be easily extended to include more registers if needed. * Used an instance of the FPGA manager to implement function VTAProgram also warning users when incompatible bitstream files are used. * Registered VTAProgram as a global function and modified the program_bitstream python class to use it. * [VTA][de10nano] Enhance de10nano runtime support. Issue: The de10nano target has incomplete, non-working support for runtime reconfiguration, bitstream programming, and examples of usage. Solution: Complete runtime support for the de10nano target. * Modified VTA.cmake to comment out a default override for VTA_MAX_XFER to 21 bit wide. * Modified VTA.cmake to add needed de10nano include dirs. * Modified relevant files to support de10nano same way as other targets for VTA runtime reconfiguration and FPGA programming. * Added test_program_rpc.py example as a runtime FPGA programming example. Note that unlike the pynq target no bitstream is either downloaded or programmed when the bitstream argument is set to None. * Cosmetic changes to vta config files. * [VTA][Chisel] LoadUop FSM bug fix. Issue: The LoadUop FSM incorrectly advances the address of the next uop to read from DRAM when the DRAM data valid bit is deasserted and asserted at the end of a read. This is caused by a mismatch in the logic of the state and output portions of the FSM. This is one of two issues that was gating the correct operation of VTA on the DE10-Nano target. Solution: Modify the logic of the output section of the FSM to include a check on the DRAM read valid bit or fold the output assignemnt into the state section. * Folded the assignemnt of the next uop address in the state section of the FSM. * [VTA][Chisel] Dynamically adjust DMA tranfer size. Issue: In the DE10-Nano target and possibly in others, DMA transfers that cross the boundaries of memory pages result in incorrect reads and writes from and to DRAM. When this happens depending on different input values, VTA loads and stores exhibit incorrect results for DMA pulses at the end of a transfer. This is one of two issues that were gating the DE10-Nano target from functioning correctly, but may affect other Chisel based targets. Solution: Add support for dynamically adjustble DMA transfer sizes in load and store operations. For a more elegant and modular implementation the feature can be enabled at compile time with a static constant that can be passed as a configuration option. * Modified the load and store finite state machines to dynamically adjust the size of initial and stride DMA transfers. The feature is enabled by default by virtue of the static constant ADAPTIVE_DMA_XFER_ENABLE. * [VTA][Chisel] Improve FSIM/TSIM/FPGA xref debug. Issue: Cross reference between FSIM, TSIM, and Chisel based FPGA traces is an invaluable instrument that enables fast analysis on FSIM, and analysis/debug on TSIM and FPGA, especially for complex flows like conv2d or full inferences. Currently this cannot be done easily since a suitable reference is missing. The clock cycle event counter cannot be used since it is undefined in FSIM and not reliable between TSIM and FPGA because of different latencies. Solution: Introduce a new event counter that preserves a program order across FSIM, TSIM, FPGA. We propose adding the accumulator write event counter in the Chisel EventCounter class and a simple instrumentation in the FSIM runtime code. Note that this technique enabled finding the Chisel issues reportes in the PR, which would have been otherwise far more difficult. * Added the acc_wr_count event counter and changed interfaces accordingly. * [VTA][de10nano] Comply with linting rules. * [VTA] Appease make lint. * [VTA] Disable pylint import not top level error. * [VTA][Chisel,de10nano] Linting changes. * Use CamelCase class names. * Use C++ style C include header files. * Add comments to Chisel makefile. * [VTA][de10nano] * Reorder C and C++ includes in de10nano_mgr.h. * Restore lint as default target in Chisel Makefile. * [VTA][de10nano] Do not use f string in pkg_config.py. * [VTA][de10nano] Remove overlooked f strings in pkg_config.py. * [VTA][de10nano] Fixed typo. * [VTA][TSIM] Check if gcc has align-new. * [VTA][Chisel] Make adaptive DMA transfer default. * [VTA][RPC] Renamed VTA_PYNQ_RPC_* to VTA_RPC_*. Issue: With more FPGA targets coming online the initial method of using individual environment variables to specify target IP and port does not scale well. Solution: Use a single VTA_RPC_HOST, VTA_RPC_PORT pair to be changed every time a different target is used. For instance in a script used to benchmark all targets. * Replaced every instance of VTA_PYNQ_RPC_HOST and VTA_PYNQ_RPC_PORT with VTA_RPC_HOST and VTA_RPC_PORT, respectively. * [VTA][Chisel] Comply with new linter.
Pasquale Cocchini committed
-
- 05 Feb, 2020 1 commit
-
-
Haichen Shen committed
-
- 27 Jan, 2020 1 commit
-
-
* Explicitly link to cublasLt * Only link cublasLt if it's found Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed
-
- 19 Jan, 2020 1 commit
-
-
This PR moves the codegen related code into the target folder, as they are target specific functionalities. We also adopt the term "compiler driver" in common compiler infra such as rust, GHC and clang. As a result, build_module is moved into the driver folder.
Tianqi Chen committed
-
- 16 Jan, 2020 1 commit
-
-
Thierry Moreau committed
-
- 10 Jan, 2020 1 commit
-
-
Zhao Wu committed
-
- 18 Dec, 2019 1 commit
-
-
Zhi committed
-
- 04 Dec, 2019 1 commit
-
-
ziheng committed
-
- 02 Dec, 2019 1 commit
-
-
HarryWu committed
-
- 22 Nov, 2019 1 commit
-
-
Zhi committed
-
- 18 Nov, 2019 1 commit
-
-
Tianqi Chen committed
-
- 15 Nov, 2019 1 commit
-
-
* [Contrib] Add MKL DNN * update * update
Haichen Shen committed
-
- 31 Oct, 2019 2 commits
-
-
Tianqi Chen committed
-
* [CI] Update the ci-gpu to use cuda10 * [CI] Enforce tensorcore gpu for unittest
Tianqi Chen committed
-
- 30 Oct, 2019 1 commit
-
-
* [CI] use llvm9 for the gpu tests * Update Docker script to support new nvidia docker
Tianqi Chen committed
-
- 27 Oct, 2019 1 commit
-
-
Tianqi Chen committed
-
- 24 Oct, 2019 1 commit
-
-
* Support setting path to ANTLR jar * Update comment
Jon Soifer committed
-
- 20 Oct, 2019 1 commit
-
-
Haichen Shen committed
-
- 08 Oct, 2019 1 commit
-
-
Issue: git clone latest TVM/VTA and run VTA on xilinx FPGA board, application crashed due to the "call stack overflow" which caused by a infinite recursive function call. this issue ever happen before and get addressed by PR 3843. Analysis: seems like de10-nano driver PR used old code base then the logic change of 3843 get eliminated. Solution: add the logic back.
Hua Jiang committed
-
- 17 Sep, 2019 1 commit
-
-
Junru Shao committed
-
- 14 Sep, 2019 1 commit
-
-
Junru Shao committed
-
- 13 Sep, 2019 1 commit
-
-
Andrew Tulloch committed
-
- 12 Sep, 2019 1 commit
-
-
This is an alternative implementation of a subset of the TVM runtime API (and graph runtime) that focuses entirely on reducing code size, at the expense of functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might be worth incrementally expanding the surface area if there's interest. The motivation for this work was seeing what the minimal useful subset of the TVM runtime is. This is relevant for e.g. super code-size constrained applications in e.g. embedded/mobile. The current runtime is more like O(100KiB) or so, so this might be compelling for some users. The smaller surface area for auditing might make this relevant for https://github.com/dmlc/tvm/issues/3159, or the usecases I was thinking about in https://github.com/dmlc/tvm/issues/2523#issuecomment-459165815 re: the Rust runtime. The symbols in the tvm::minimalruntime space (i.e. excluding std:: and picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we could replace picojson:: with [`jsmn`](https://zserge.com/jsmn.html) or something, and we could replace more of the `std::unordered_map` usage, etc with custom primitives as well (similar to the `DynArray`).
Andrew Tulloch committed
-
- 07 Sep, 2019 2 commits
-
-
* fix cmake for mac os * rename
Haichen Shen committed -
* [VTA] Support TLPP in function simulator. Issue: currently vta function simulator just doing serialized instruction execution, the dependency logic of runtime ISA which use for task level pipe line parallelism can not get verified by function simulator. Solution: make the simulator driver to be multiple thread and support TLPP. Benefit: TLPP support VTA function simulator would make VTA logic testing/debug /change more easy. replace boost lockfree queue add configure control for simulator tlpp enable or disable. change code tyle into google style. Wrap queue read/write and sync logic to make function call more simple. Add some comments. Remove MT logic, change into Single thread mode. address review comments. code style change to match google code style and add comments. add cmake macro to enable/disable simulator tlpp logic. submodule update. correct file name mentioned in comments. * remove USE_VTA_FSIM_TLPP.
Hua Jiang committed
-
- 06 Sep, 2019 1 commit
-
-
Installed through pypi
Jason Knight committed
-
- 05 Sep, 2019 1 commit
-
-
* rework; * `de10-nano` -> `de10nano`; * fix compilation error; * bug fix; * Update install.md * Update install.md * Update install.md * update with current runtime; * add debug messages; * bug fix in cma kernel module;
Liangfu Chen committed
-
- 02 Sep, 2019 1 commit
-
-
Luis Vega committed
-
- 29 Aug, 2019 2 commits
-
-
Issue when try vta on fpga board, would see a Infinite recursive device_api.ext_dev issue that cause stack overflow and vta failed. Analysis: device_api.ext_dev function in rpc_server.py is use to load vta library, once vta library get load, device_api.ext_dev would get replaced with vta function by vta library, vta device_api.cc did such work, but because a logic issue in VTA.cmake, the said file not get compiled, then vta would keep failing on rpc_server.py. Solution: fix the logic issue in VTA.cmake.
Hua Jiang committed -
Jon Soifer committed
-