- 30 Apr, 2020 1 commit
-
-
Co-authored-by: Zhang Hao <zhanghao@4paradigm.com>
ZHANG Hao committed
-
- 22 Apr, 2020 1 commit
-
-
Substitute now takes a std::function to customize more replacing behaviors. Co-authored-by: Siyuan Feng <hzfengsy@sjtu.edu.cn> Co-authored-by: Siyuan Feng <hzfengsy@sjtu.edu.cn>
Tianqi Chen committed
-
- 21 Apr, 2020 2 commits
-
-
Tianqi Chen committed
-
The legacy Simplify/CanonicalSimplify are now a thin wrapper around the Analyzer. This PR removes these functions and migrated every place that requires simplification to enforce Analyzer creation. The new API would encourage more Analyzer sharing and potentially enable context-aware analyzer-based simplification.
Tianqi Chen committed
-
- 20 Apr, 2020 1 commit
-
-
* [TIR][REFACTIR] RewriteForTensorCore -> te/schedule RewriteForTensor depends on the schedule information, which makes it differ from a typical pass(which should get all the information from the input TIR). As a result, we refactor it as a SchedulePostProc step for now. We should revisit it later as we introduce more support for tensor core patterns in the TIR. * Fix VTA to fit the new IR Pattern
Tianqi Chen committed
-
- 15 Apr, 2020 2 commits
-
-
* [TIR] Remove ProducerConsumer and AllocateNode::new_expr This PR removes two legacy IR parts in TIR that are deprecated. ProducerConsumer node only serves as a hint markup and may no longer be informative after extensive transformations in the pass. If necessary, we can add related info via AttrStmt. The new_expr field in the AllocateNode is deprecated since it can just be replaced by a LetStmt. - Remove dependencies of passes on ProducerConsumer. - Remove ProducerConsumer from the IR. - Remove the deprecated fields (new_expr, free_function) from AllocateNode. * Fix additional testcases
Tianqi Chen committed -
Tianqi Chen committed
-
- 13 Apr, 2020 1 commit
-
-
* [RUNTIME] Allow non-nullable ObjectRef, introduce Optional<T>. We use ObjectRef and their sub-classes extensively throughout our codebase. Each of ObjectRef's sub-classes are nullable, which means they can hold nullptr as their values. While in some places we need nullptr as an alternative value. The implicit support for nullptr in all ObjectRef creates additional burdens for the developer to explicitly check defined in many places of the codebase. Moreover, it is unclear from the API's intentional point of view whether we want a nullable object or not-null version(many cases we want the later). Borrowing existing wisdoms from languages like Rust. We propose to introduce non-nullable ObjectRef, and Optional<T> container that represents a nullable variant. To keep backward compatiblity, we will start by allowing most ObjectRef to be nullable. However, we should start to use Optional<T> as the type in places where we know nullable is a requirement. Gradually, we will move most of the ObjectRef to be non-nullable and use Optional<T> in the nullable cases. Such explicitness in typing can help reduce the potential problems in our codebase overall. Changes in this PR: - Introduce _type_is_nullable attribute to ObjectRef - Introduce Optional<T> - Change String to be non-nullable. - Change the API of function->GetAttr to return Optional<T> * Address review comments * Upgrade all compiler flags to c++14 * Update as per review comment
Tianqi Chen committed
-
- 02 Apr, 2020 1 commit
-
-
* [REFACTOR][TIR] Introduce ExprDeepEqual, Remove IRDeepCompare This PR introduces ExprDeepEqual which reuses the StructuralEqual infra. We migrated the usecases of ir_pass::Equal to ExprDeepEqual and StructuralEqual. * Address comments
Tianqi Chen committed
-
- 31 Mar, 2020 1 commit
-
-
* refactor * path udpate
Thierry Moreau committed
-
- 30 Mar, 2020 4 commits
-
-
* [VTA] Set the correct type for synchronize * Fix the legacy API * Temporary remove the structural equal
Tianqi Chen committed -
* [DOCS] Point docs to the ASF site. We have migrated the main docs to the ASF site, which will be periodically updated using the docs generated by the CI. Points the docs to the ASF version. * [CI] Improve the docs generation script
Tianqi Chen committed -
* [CI] Improve VTA build message and scripts. * Use absolute path to set the env var
Tianqi Chen committed -
* [DOCS] Various sphinx related fix. - Use :ref: for reference. - Use :py:class: to refer to API docs. - Update installation guide to also refer to the download page. - Only move html contents in doxygen. * Address review comments * Update wording
Tianqi Chen committed
-
- 29 Mar, 2020 2 commits
-
-
Tianqi Chen committed
-
Thierry Moreau committed
-
- 18 Mar, 2020 1 commit
-
-
Tianqi Chen committed
-
- 12 Mar, 2020 2 commits
-
-
* init * fix template * tweak naming
Haichen Shen committed -
Thierry Moreau committed
-
- 09 Mar, 2020 1 commit
-
-
* [VTA][de10nano] Enable user defined target frequency. Issue: The VTA target frequency on the DE10-Nano is hardcoded to 50MHz unnecessarily limiting performance. Solution: Add a PLL to the FPGA sub-system along with support for the selection of a user specified frequency at build time. The board successfully builds and runs at 100MHz. * Added a PLL in the soc_system.tcl platform designer generator script. * Modified the Makefile to automatically set the target frequency from that specified in the pkg_config.py file. * Modified the Makefile to generate a bitstream with an RBF format that enables programming of the FPGA directly from the on-board processor. Specifically, the RBF is generated in FastParallel32 mode with compression, which corresponds to the default MSEL switch setting on the board, i.e. 01010. * Added a false path override to file set_clocks.sdc to turn off unconstrained path warnings on the VTA pulse LED. * [VTA][TSIM] Add more debug and tracing options. * Modified Makefile to change default config to DafaultDe10Config. * Added option in Makefile to produce more detailed tracing for extra observability in debugging complex scenarios. * Added option in Makefile to produce traces in FST format which are 2 orders of magnitude smaller, although much slower to generate. * Added option in Makefile to build the simulator with GCC address sanitizer. * Modified Makefile to not lint the scala code by default avoiding unintended wrong indentation. Linting should be better performed manually on a per-need basis. * [VTA][de10nano] Enable remote programming of FPGA. Issue: The Cyclone V FPGA on board of the DE10-Nano can only be programmed using the JTAG port, which is a limiting option for users. Solution: Add support for the remote programming of the FPGA implementing the FPGA programming manager protocol published in the Cyclone V user manual. * Added file de10nano_mgr.h implementing an FPGA manager class that supports handling of control and status registers as well as a push-button option to program the FPGA. The class can be easily extended to include more registers if needed. * Used an instance of the FPGA manager to implement function VTAProgram also warning users when incompatible bitstream files are used. * Registered VTAProgram as a global function and modified the program_bitstream python class to use it. * [VTA][de10nano] Enhance de10nano runtime support. Issue: The de10nano target has incomplete, non-working support for runtime reconfiguration, bitstream programming, and examples of usage. Solution: Complete runtime support for the de10nano target. * Modified VTA.cmake to comment out a default override for VTA_MAX_XFER to 21 bit wide. * Modified VTA.cmake to add needed de10nano include dirs. * Modified relevant files to support de10nano same way as other targets for VTA runtime reconfiguration and FPGA programming. * Added test_program_rpc.py example as a runtime FPGA programming example. Note that unlike the pynq target no bitstream is either downloaded or programmed when the bitstream argument is set to None. * Cosmetic changes to vta config files. * [VTA][Chisel] LoadUop FSM bug fix. Issue: The LoadUop FSM incorrectly advances the address of the next uop to read from DRAM when the DRAM data valid bit is deasserted and asserted at the end of a read. This is caused by a mismatch in the logic of the state and output portions of the FSM. This is one of two issues that was gating the correct operation of VTA on the DE10-Nano target. Solution: Modify the logic of the output section of the FSM to include a check on the DRAM read valid bit or fold the output assignemnt into the state section. * Folded the assignemnt of the next uop address in the state section of the FSM. * [VTA][Chisel] Dynamically adjust DMA tranfer size. Issue: In the DE10-Nano target and possibly in others, DMA transfers that cross the boundaries of memory pages result in incorrect reads and writes from and to DRAM. When this happens depending on different input values, VTA loads and stores exhibit incorrect results for DMA pulses at the end of a transfer. This is one of two issues that were gating the DE10-Nano target from functioning correctly, but may affect other Chisel based targets. Solution: Add support for dynamically adjustble DMA transfer sizes in load and store operations. For a more elegant and modular implementation the feature can be enabled at compile time with a static constant that can be passed as a configuration option. * Modified the load and store finite state machines to dynamically adjust the size of initial and stride DMA transfers. The feature is enabled by default by virtue of the static constant ADAPTIVE_DMA_XFER_ENABLE. * [VTA][Chisel] Improve FSIM/TSIM/FPGA xref debug. Issue: Cross reference between FSIM, TSIM, and Chisel based FPGA traces is an invaluable instrument that enables fast analysis on FSIM, and analysis/debug on TSIM and FPGA, especially for complex flows like conv2d or full inferences. Currently this cannot be done easily since a suitable reference is missing. The clock cycle event counter cannot be used since it is undefined in FSIM and not reliable between TSIM and FPGA because of different latencies. Solution: Introduce a new event counter that preserves a program order across FSIM, TSIM, FPGA. We propose adding the accumulator write event counter in the Chisel EventCounter class and a simple instrumentation in the FSIM runtime code. Note that this technique enabled finding the Chisel issues reportes in the PR, which would have been otherwise far more difficult. * Added the acc_wr_count event counter and changed interfaces accordingly. * [VTA][de10nano] Comply with linting rules. * [VTA] Appease make lint. * [VTA] Disable pylint import not top level error. * [VTA][Chisel,de10nano] Linting changes. * Use CamelCase class names. * Use C++ style C include header files. * Add comments to Chisel makefile. * [VTA][de10nano] * Reorder C and C++ includes in de10nano_mgr.h. * Restore lint as default target in Chisel Makefile. * [VTA][de10nano] Do not use f string in pkg_config.py. * [VTA][de10nano] Remove overlooked f strings in pkg_config.py. * [VTA][de10nano] Fixed typo. * [VTA][TSIM] Check if gcc has align-new. * [VTA][Chisel] Make adaptive DMA transfer default. * [VTA][RPC] Renamed VTA_PYNQ_RPC_* to VTA_RPC_*. Issue: With more FPGA targets coming online the initial method of using individual environment variables to specify target IP and port does not scale well. Solution: Use a single VTA_RPC_HOST, VTA_RPC_PORT pair to be changed every time a different target is used. For instance in a script used to benchmark all targets. * Replaced every instance of VTA_PYNQ_RPC_HOST and VTA_PYNQ_RPC_PORT with VTA_RPC_HOST and VTA_RPC_PORT, respectively. * [VTA][Chisel] Comply with new linter.
Pasquale Cocchini committed
-
- 07 Mar, 2020 1 commit
-
-
* scalafmt => scalastyle Change-Id: Ifc590e7cb63585f35dfdc9efcf3c6287b1afb1dd * scalafmt => scalastyle Change-Id: I8aff2632dadda05d2896e28bdaf6f780a160a15a * add indentation constraint Change-Id: Ibeb00c11a5718ea47322ea2b82e757828af8af91 * trigger ci again
Liangfu Chen committed
-
- 27 Feb, 2020 2 commits
-
-
* [DOCS] Sphinx -- Introduce alias detection. Background: some of our namespaces import function from another namespace. For example tvm.te imports most of the operators from tvm.tir. Previously we manually exclude these aliases from the doc. However that means we can not link them by the alias name. This PR adds a sphinx callback plugin to detect such aliases, and create a rubric block on the button of its current docstring `Alias of the original class`. It is done in a way so that we can refer to the generated docs. We also fixed a few docs errors. * Fix most of the issues
Tianqi Chen committed -
* [REFACTOR][PY][API-CHANGE] Remove legacy python files. Remove legacy python files. Use the te namespace for most of the tensor expression primitives. - tvm.create_schedule -> tvm.te.create_schedule - tvm.placeholder -> tvm.te.placeholder - tvm.compute -> tvm.te.compute * Remove top-level exposures.
Tianqi Chen committed
-
- 26 Feb, 2020 2 commits
-
-
* [VTA] YoloV3 Support Issue: YoloV3 use some operator and logic that not get good support by existing vta logic, like nn.pad, upsample, and 255 output channel. Solution: add related logic to let darknet YoloV3 can running on VTA * Fix small(0, or 1 heigh/width) detect frame issue. * add yolov3-tiny turtorial * add os import * address review comments. * rename tutorial file with a short name. * rename deploy_vision_on_vta.py into deploy_classification.py. * address review comment, fix plint eror in deploy_detection.py
Hua Jiang committed -
* [DOCS] Fix Sphinx Warnings: the target found for cross-reference warnings * Fix the warning: undefined label
Neo Chien committed
-
- 24 Feb, 2020 1 commit
-
-
* relay op strategy fix lint bitpack strategy bitserial_dense (#6) * update strategy * address comments fix a few topi test Dense strategy (#5) * dense * add biforst; remove comments * address comment Refactor x86 conv2d_NCHWc (#4) * Refactor x86 conv2d * Add x86 depthwise_conv2d_NCHWc * Add back topi x86 conv2d_nchw * Merge x86 conv2d_nchw and conv2d_NCHWc * Minor fix for x86 conv2d fix more strategy Add x86 conv2d_NCHWc_int8 strategy (#8) * Add x86 conv2d_NCHWc_int8 strategy * Remove contrib_conv2d_nchwc_int8 * Fix generic conv2d_NCHWc for int8 * Fix topi arm_cpu conv2d_NCHWc_int8 update x86 conv2d enable specify relay ops to be tuned for autotvm add cuda conv2d strategy add conv2d strategy for rocm add conv2d strategy for hls add conv2d strategy for arm cpu add conv2d strategy for mali add conv2d strategy for bifrost add conv2d strategy for intel graphics clean up and fix lint remove template keys from autotvm remove 2 in the func name address comments fix * fix bugs * lint * address comments * add name to op implement * Modify topi tests (#9) * Add pooling, reorg, softmax and vision * Add lrn * fix topi test * fix more topi test * lint * address comments * x * fix more tests & bugs * Modify more tests (#10) * Modify tests for bitserial_conv2d, bitserial_dense, bitserial_conv2d_rasp and bnn * Minor fix * More minor fix * fix more test * try to update vta using strategy * fix cpptest * x * fix rebase err * Fix two tests (#11) * change autotvm log format * lint * minor fix * try fix vta test * fix rebase err * tweak * tmp hack for vta pass * fix tutorial * fix * fix more tutorials * fix vta tutorial * minor * address comments * fix * address comments * fix cpptest * fix docs * change data structure name and api * address comments * lint * fix rebase err * updates * fix winograd test * fix doc * rebase * upgrade tophub version number * fix bug * re-enable vta tsim test after tophub is upgraded * fix vta test to use the correct args so the config can be found in tophub Co-authored-by: Yao Wang <kevinthesunwy@gmail.com>
Haichen Shen committed
-
- 20 Feb, 2020 1 commit
-
-
* fix indents * Fix image scale and cross-ref
Cody Yu committed
-
- 18 Feb, 2020 2 commits
-
-
Tianqi Chen committed
-
- Move the related files to tvm.te - Move build_module.py to tvm.driver
Tianqi Chen committed
-
- 14 Feb, 2020 1 commit
-
-
- Move related files into the corresponding location as in C++ - Keep the top-level TVM API backward compatible to make minimum changes in topi
tqchen committed
-
- 13 Feb, 2020 1 commit
-
-
Move the related target modules into tvm.target. API change: - tvm.target.current_target -> tvm.target.Target.current - tvm.datatype -> tvm.target.datatype
tqchen committed
-
- 12 Feb, 2020 1 commit
-
-
* [REFACTOR][PY][API-CHANGE] establish tvm.ir, migrate corresponding relay files. This PR establishes tvm.ir and migrates the corresponding relay files into the new folder. API Change: - relay.Module -> tvm.IRModule * Update with ADT * Migrate transform * address comments * Migrate module * Migrate json_compact * Migrate attrs * Move LoweredFunc to stmt temporarily * temp migrate container * Finish migrate container
Tianqi Chen committed
-
- 09 Feb, 2020 1 commit
-
-
Tianqi Chen committed
-
- 07 Feb, 2020 2 commits
-
-
Cody Yu committed
-
* [REFACTOR][PY-API] Polish tvm.runtime, tvm.runtime.module API update This PR updates the tvm.runtime to use the new FFI style. - Remove top-level tvm.module to avoid confusion between runtime.Module and IRModule - API changes wrt to runtime.Module - tvm.module.load -> tvm.runtime.load_module - tvm.module.enabled -> tvm.runtime.enabled - tvm.module.system_lib -> tvm.runtime.system_lib - Remove dep on api_internal from runtime. * Update module.load in the latest API
Tianqi Chen committed
-
- 04 Feb, 2020 2 commits
-
-
Tianqi Chen committed
-
* [LINT] Fix -Wextra * Fix virtual-dtor
Tianqi Chen committed
-
- 23 Jan, 2020 1 commit
-
-
* [VTA] Support network which have no unique operator as start/stop name for graph pack. [Issue] Current vta use 'start' and 'stop' name to define the pack start point and end point, but this method not work for these network which have no 2 unique operator as start point and stop point. [Solution] In this solution we give 2 addtional parameters start_name_indx and stop_name_indx to make vta pack logic work with the said network, for exampl for following networks which have no unique operator, %0 = nn.add %1 = nn.conv2d %2 = nn.batch_norm %3 = nn.leaky_relu %4 = nn.add %5 = nn.conv2d %6 = nn.batch_norm %7 = nn.leaky_relu %8 = nn.add with this solution we can use following parameter format to make vta work on it. relay_prog = graph_pack( //.... start_name="nn.add", stop_name="nn.add", start_name_idx=0, stop_name_idx=4) to apply on new network, by printing the network we can get index information like following. print(mod.astext(show_meta_data=False)) relay_prog = graph_pack(mod ... start_name="nn.add", stop_name="nn.add", start_name_idx=0, stop_name_idx=4) * address review comments and fix index count bug issue: when do print(mod), the output not only the Call is also have other type like Var, need add logic to count all except meta. solution: add related logic * address review comments. * address review comments * add more detail comments.
Hua Jiang committed
-
- 19 Jan, 2020 1 commit
-
-
This PR moves the codegen related code into the target folder, as they are target specific functionalities. We also adopt the term "compiler driver" in common compiler infra such as rust, GHC and clang. As a result, build_module is moved into the driver folder.
Tianqi Chen committed
-
- 17 Jan, 2020 1 commit
-
-
* Update task_python_vta.sh * install sbt=1.1.1 with apt-get * update verilator_opt * install verilator with major version 4.0 * disable multi-threading for now * bug fix for correcting uop fetch address in LoadUop module * bug fix for correcting uop fetch address in LoadUop module * adjustment to read from dram_offset * enable USE_THREADS with verilator 4.x * DEBUG: try avoid core dump with verilator 4.x * bug fix in LoadUop module * log mega cycles in tsim * download cat.png to avoid fetching in each run * bug fix in LoadUop module * solve dram_even/sram_even issue * bug fix * introduce scalalint in ci * speedup tsim in ci * bug fix * lint scala code before building * disable multi-threading * split fsim/tsim script * update Jenkins settings * duplicate task_python_vta_fsim.sh as task_python_vta.sh for now Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
Liangfu Chen committed
-