- 13 Mar, 2020 1 commit
-
-
Andrew Liu committed
-
- 12 Mar, 2020 14 commits
-
-
- Use fuzzy comparison for double. - Removed the hack for BatchNormAttrs and DictAttr. Also removed a warning from text printer printing.
Tianqi Chen committed -
Tianqi Chen committed
-
The ubuntu_install_llvm.sh script started failing because of a http to https redirect. This patch adds the package that allows apt to handle https transport. Change-Id: I70bcba32a9fc75d02c54f4f21f288b2f46226689
Marcus Shawcroft committed -
* init * fix template * tweak naming
Haichen Shen committed -
* [CUDA] Op strategy changes for Int8 schedules. * Applying Haichen's suggestions. * Make 4D output work for task extraction. * Make x86 work. * Fix lint. * Lint fixes. * Tests, comments, out channel a multiple of 4. * Topi test. Co-authored-by: Ubuntu <ubuntu@ip-172-31-38-96.us-west-2.compute.internal>
Animesh Jain committed -
ANSHUMAN TRIPATHY committed
-
pankratz committed
-
* [REFACTOR] Streamline Function Attr interface. There has been quite a few recent changes that depends heavily on the function attr interface. This PR streamlines that interface by introducing two APIs that covers most of the usages. - GetAttr which gets a typed object for a given key - HasNonzeroAttr is a quick helper that calls GetAttr to quickly check an attribute - WithAttr that creates a new function object with the given attr - The API comes with copy on write optimization to avoid multiple copies - We deliberately pick the prefix With(instead of Set) to indicate this function does not mutate the original input. On the python side: - We allow read access via func.attrs (which is a DictAttr) - func.with_attrs to create a new instance with updated attrs. We also get rid of the small wrapper functions and make sure the API centered around the GetAttr and HasNonzeroAttr interface. This PR also changes the function construction to follow the new convention. * Address review comments * Address review comments * Fix doxygen path
Tianqi Chen committed -
* [TFLITE][FRONTEND]Reduce_any op parsing support * Testcase check added to run in tf version above 1.14.0 & review comments * Review comment, checked updated to 1.15
Samuel committed -
Samuel committed
-
Fernand Pajot committed
-
Set split node's range to minimum of ext and split factor or split nparts, but only when PassDownDomain is called with allow_missing == false, i.e. by InferBound. Add a helper PassUpThreadBinding() to get a map telling whether an IterVar has at least one leaf IterVar deriving from it binding to a thread. Add two unit tests. (#5044)
yongfeng-nv committed -
Thierry Moreau committed
-
* [refactor][relay pass] Separate analysis and transform passes into different subfolders * remove pass folder
Zhi committed
-
- 11 Mar, 2020 11 commits
-
-
Wei Chen committed
-
* Conditions updated to cover better user scenarios * [1] New test case added * [2] New test case added * [3] Proper variable name used * [4] Review Comments handled * [5] Review comments handled * [6] Review comments handled
ANSHUMAN TRIPATHY committed -
Wei Chen committed
-
* Support 3d Convolution with the ONNX frontend * add unit tests for conv3d in onnx frontend respond to PR formatting requests add x86 schedules to conv3d ncdhw test fix a doc string format issue refactor for changed upsream API * first attempt at conv3d autotuning add default schedule for conv3d_ncdhw fill in autotvm integration add a fallback for invalid schedules fix fallback fix reduction order to get simd working correctly
Matthew Brookhart committed -
* [intrin] exp2 * [intrin] exp10 * [intrin] log2/10 * [intrins] exp10 * [test] math intrin
Bing Xu committed -
This reverts commit 585f9ce6.
Lianmin Zheng committed -
This reverts commit fe74b37a.
Tianqi Chen committed -
* [QNN] Support 4D padding. * Empty commit. Co-authored-by: Ubuntu <ubuntu@ip-172-31-38-96.us-west-2.compute.internal>
Animesh Jain committed -
* [TFLITE]elu, leaky_relu, lrn, log_softmax activation functions * removed ops present in pr 4805 * review_comments updated
Samuel committed -
- This patch allows CUDA backend to emit correct code for selects with vector conditions, which may be produced by floordiv op lowering etc.. - This already works for llvm BE, as llvm select instruction supports vector conditions. Signed-off-by: Wei Pan <weip@nvidia.com>
Wei Pan committed -
* Add relay operation relay.op.tan. * Update tan implementation in TVM. * Update tests. * Add shape function for tan. * Add missing main test to python/frontend/tensorflow/test_forward. * Revert, back to sin/cos. * Revert "Revert, back to sin/cos." This reverts commit 4da5b503b921585ba9d80944b29136142b575c40. * Fix implementation of tan in cuda. Do not support tan for float16. Simplify topi/tests/python/test_topi_math. Add testing for tan with float32 and float64. Finally implement tan as sin/cos in llvm.
notoraptor committed
-
- 10 Mar, 2020 8 commits
-
-
Tianqi Chen committed
-
* Add support for prim::If and prim::Loop with test cases * rebase and fix tests * add some comments * simplifying, fix float cast * parse -> convert * recursivly retrive ops in get_all_op_names * use multiple return values from block correctly, simplify loop convert * choose dtype properly for zeros and ones * simplifying, replace convert_inputs with _get_relay_input_vars * fix for while loop with non input dependent init cond * add assert on loop var update * move the condition around * better testing for seg models * rebase fix, disable inception v3 in quant test as it is too slow to load with torch-1.4 + torchvision 0.5 * simplify and add more comparison op converter
masahi committed -
* A composite function should not be primitive since we still may need to perform passes on it. Change-Id: If62d06d265234861a6ec0df7749dc1c339c1055c
lhutton1 committed -
* [1] New test case added for fuse * [2] New test case added for fuse * [3] New test case added for fuse * [4] New test case added for fuse * [5] Early check added
ANSHUMAN TRIPATHY committed -
* implement kDLCPUPinned * Fix line endings * Fix whitespace for linter * cleanup up allocdataspace method
jmorrill committed -
* Add Nick's changes's squashed * Fix frontend compilation * Re-enable Rust CI * Add changes with conflicted badly * Restructure import_module! macro in order to avoid unstable features * Kill old unstable feature enablement * Refactor common to use new APIs * Move the code to stable * Fix warning Co-authored-by: Nick Hynes <nhynes@oasislabs.com>
Jared Roesch committed
-
- 09 Mar, 2020 4 commits
-
-
This reverts commit fc7f0783.
Animesh Jain committed -
雾雨魔理沙 committed
-
* implement of MISRA-C compliant TVM runtime; * working on bundle_deploy_c demo * move header files into include dir * fix compatibility issues * fix compatibility issues * resolve most of the warnings and errros * implement c_backend_api * introduce bridge * working well * move to header files and bundle.c into src/runtime/crt * clean up * satisfy linter * clean up * test with the cat image * remove synset * refactoring * refactoring * refactoring * initial crt_runtime_api.c * improved compatibility with g++ * using exposed API in c_runtime_api.h * call from c_runtime_api.h * clean up * lint * merge into apps/bundle_deploy directory Change-Id: I51904db81b8589e65d107d8ca77b47452e3812b5 * make the demo runs in ci Change-Id: I2c24f8b592508833d3555311c2b24d1931f19385 * address review comments Change-Id: I027ddff15c31fb4da0bd0e461427dce619de1f93 * release Change-Id: I5ad5bb8426468aac9fc8d074e56ddea358a7fd91 * fix ci testing Change-Id: Ic2e82fb3051b6c254ef32a964f976b61e3e5fe4d * add test case for misra c runtime Change-Id: Ie0dfd0ade6be4665b4384db7d260a6c69b35010f * fread files in testing to avoid calling xxd Change-Id: Ie7fbc16b4b0b9509918d986a841f443900813bef
Liangfu Chen committed -
* [VTA][de10nano] Enable user defined target frequency. Issue: The VTA target frequency on the DE10-Nano is hardcoded to 50MHz unnecessarily limiting performance. Solution: Add a PLL to the FPGA sub-system along with support for the selection of a user specified frequency at build time. The board successfully builds and runs at 100MHz. * Added a PLL in the soc_system.tcl platform designer generator script. * Modified the Makefile to automatically set the target frequency from that specified in the pkg_config.py file. * Modified the Makefile to generate a bitstream with an RBF format that enables programming of the FPGA directly from the on-board processor. Specifically, the RBF is generated in FastParallel32 mode with compression, which corresponds to the default MSEL switch setting on the board, i.e. 01010. * Added a false path override to file set_clocks.sdc to turn off unconstrained path warnings on the VTA pulse LED. * [VTA][TSIM] Add more debug and tracing options. * Modified Makefile to change default config to DafaultDe10Config. * Added option in Makefile to produce more detailed tracing for extra observability in debugging complex scenarios. * Added option in Makefile to produce traces in FST format which are 2 orders of magnitude smaller, although much slower to generate. * Added option in Makefile to build the simulator with GCC address sanitizer. * Modified Makefile to not lint the scala code by default avoiding unintended wrong indentation. Linting should be better performed manually on a per-need basis. * [VTA][de10nano] Enable remote programming of FPGA. Issue: The Cyclone V FPGA on board of the DE10-Nano can only be programmed using the JTAG port, which is a limiting option for users. Solution: Add support for the remote programming of the FPGA implementing the FPGA programming manager protocol published in the Cyclone V user manual. * Added file de10nano_mgr.h implementing an FPGA manager class that supports handling of control and status registers as well as a push-button option to program the FPGA. The class can be easily extended to include more registers if needed. * Used an instance of the FPGA manager to implement function VTAProgram also warning users when incompatible bitstream files are used. * Registered VTAProgram as a global function and modified the program_bitstream python class to use it. * [VTA][de10nano] Enhance de10nano runtime support. Issue: The de10nano target has incomplete, non-working support for runtime reconfiguration, bitstream programming, and examples of usage. Solution: Complete runtime support for the de10nano target. * Modified VTA.cmake to comment out a default override for VTA_MAX_XFER to 21 bit wide. * Modified VTA.cmake to add needed de10nano include dirs. * Modified relevant files to support de10nano same way as other targets for VTA runtime reconfiguration and FPGA programming. * Added test_program_rpc.py example as a runtime FPGA programming example. Note that unlike the pynq target no bitstream is either downloaded or programmed when the bitstream argument is set to None. * Cosmetic changes to vta config files. * [VTA][Chisel] LoadUop FSM bug fix. Issue: The LoadUop FSM incorrectly advances the address of the next uop to read from DRAM when the DRAM data valid bit is deasserted and asserted at the end of a read. This is caused by a mismatch in the logic of the state and output portions of the FSM. This is one of two issues that was gating the correct operation of VTA on the DE10-Nano target. Solution: Modify the logic of the output section of the FSM to include a check on the DRAM read valid bit or fold the output assignemnt into the state section. * Folded the assignemnt of the next uop address in the state section of the FSM. * [VTA][Chisel] Dynamically adjust DMA tranfer size. Issue: In the DE10-Nano target and possibly in others, DMA transfers that cross the boundaries of memory pages result in incorrect reads and writes from and to DRAM. When this happens depending on different input values, VTA loads and stores exhibit incorrect results for DMA pulses at the end of a transfer. This is one of two issues that were gating the DE10-Nano target from functioning correctly, but may affect other Chisel based targets. Solution: Add support for dynamically adjustble DMA transfer sizes in load and store operations. For a more elegant and modular implementation the feature can be enabled at compile time with a static constant that can be passed as a configuration option. * Modified the load and store finite state machines to dynamically adjust the size of initial and stride DMA transfers. The feature is enabled by default by virtue of the static constant ADAPTIVE_DMA_XFER_ENABLE. * [VTA][Chisel] Improve FSIM/TSIM/FPGA xref debug. Issue: Cross reference between FSIM, TSIM, and Chisel based FPGA traces is an invaluable instrument that enables fast analysis on FSIM, and analysis/debug on TSIM and FPGA, especially for complex flows like conv2d or full inferences. Currently this cannot be done easily since a suitable reference is missing. The clock cycle event counter cannot be used since it is undefined in FSIM and not reliable between TSIM and FPGA because of different latencies. Solution: Introduce a new event counter that preserves a program order across FSIM, TSIM, FPGA. We propose adding the accumulator write event counter in the Chisel EventCounter class and a simple instrumentation in the FSIM runtime code. Note that this technique enabled finding the Chisel issues reportes in the PR, which would have been otherwise far more difficult. * Added the acc_wr_count event counter and changed interfaces accordingly. * [VTA][de10nano] Comply with linting rules. * [VTA] Appease make lint. * [VTA] Disable pylint import not top level error. * [VTA][Chisel,de10nano] Linting changes. * Use CamelCase class names. * Use C++ style C include header files. * Add comments to Chisel makefile. * [VTA][de10nano] * Reorder C and C++ includes in de10nano_mgr.h. * Restore lint as default target in Chisel Makefile. * [VTA][de10nano] Do not use f string in pkg_config.py. * [VTA][de10nano] Remove overlooked f strings in pkg_config.py. * [VTA][de10nano] Fixed typo. * [VTA][TSIM] Check if gcc has align-new. * [VTA][Chisel] Make adaptive DMA transfer default. * [VTA][RPC] Renamed VTA_PYNQ_RPC_* to VTA_RPC_*. Issue: With more FPGA targets coming online the initial method of using individual environment variables to specify target IP and port does not scale well. Solution: Use a single VTA_RPC_HOST, VTA_RPC_PORT pair to be changed every time a different target is used. For instance in a script used to benchmark all targets. * Replaced every instance of VTA_PYNQ_RPC_HOST and VTA_PYNQ_RPC_PORT with VTA_RPC_HOST and VTA_RPC_PORT, respectively. * [VTA][Chisel] Comply with new linter.
Pasquale Cocchini committed
-
- 08 Mar, 2020 2 commits
-
-
ANSHUMAN TRIPATHY committed
-
Haichen Shen committed
-