- 21 Nov, 2019 3 commits
-
-
Previously, we would rely on the later phases to error out (often for using too much shared memory). This enables the checks on the IR that already exist for CUDA and OpenCL also for ROCm.
Thomas Viehmann committed -
Animesh Jain committed
-
Zhi committed
-
- 20 Nov, 2019 7 commits
-
-
Tianqi Chen committed
-
* [ThreadPool] Solve thread transitions issue * Use pthread_atfork to avoid master thread affinity be derived by child. * Code Format * comment of exclude_worker0_ * set full cpu affinity * Redundant blank line * CPPLint * CPPLint namespace * CPPLint * Fix the wrong logic of bind master thread.
Zhao Wu committed -
Alexander Pivovarov committed
-
Yizhi Liu committed
-
masahi committed
-
Liang ZOU committed
-
Tianqi Chen committed
-
- 19 Nov, 2019 7 commits
-
-
Yizhi Liu committed
-
* [Relay][Quantize] Integrate data-aware calibration into quantization * Update _calibrate.py * trigger ci * Address comments * address comments
Wuwei Lin committed -
* [PERF] parallel reduction in cpu * fix * x * update * lint * fix
Haichen Shen committed -
* [tutorial] nnvm -> relay * use relay workload * delete movbilenetv2 option
Yizhi Liu committed -
Alexander Pivovarov committed
-
Animesh Jain committed
-
* add rule for clean * Update clean rule Seems like lib/ directory is not made by the makefile So don't delete directory, just the contents of it.
miheer vaidya committed
-
- 18 Nov, 2019 6 commits
-
-
Yizhi Liu committed
-
Cody Hao Yu committed
-
Tianqi Chen committed
-
* Add tf FloorMod * Add floor_div/mod into topi and relay * Add to rst * Fix test
Yao Wang committed -
* [Relay][Frontend][Tensorflow]Add conv2d_transpose * add transformation from NHWC to NCHW to compatible with TVM conv2d_transpose implementation * remove 'dilations' paramater to compitable with TF1.3
optima2005 committed -
When getting cuda schedule passing single tensor seem to work but after changing target to "llvm" causes assert. Sending list on other hand makes both cuda and llvm targets happy. See https://discuss.tvm.ai/t/solved-simple-example-error-attributeerror-tensorslice-object-has-no-attribute-op/2245/3
miheer vaidya committed
-
- 16 Nov, 2019 6 commits
-
-
Philip Hyunsu Cho committed
-
* Add qnn conv2d attributes for input_tensor_scale and kernel_tensor_scale. The lowering in the tflite frontend loses the input_tensor_scale and the kernel_tensor_scale by multiplying it and putting it into the Requantize operation. This means that any graph partitioning passes or other passes that need to access this information no longer have it available in the qnn dialect. regards Ramana * Store input tensor scale and Weight tensor scale for Dense as well As for conv2d, the tflite frontend drops the input tensor scale and the weight tensor scale from the relay op. Store it as separate fields in there. * Fix unintentional tab * Rename input_tensor_scale to input_scale and kernel_tensor_scale to kernel_scale for conv2d. * input_tensor_scale -> input_scale weight_tensor_scale->weight_scale * Rework dense testcase And use input_scale and kernel_scale * Be consistent in use of input_scale and kernel_scale values * Fixup qnn conv2d tests for input_scale and kernel_scale * Make pydoc identical between conv2d and dense for weight_tensor * Fix up conv2d parameters to be in the same order between C++ and python * Fix ordering of parameters for dense. * Add input_scale and output_scale to try and satisfy ci gods * Delete input_scale and kernel_scale. nn.conv2d does not contain input_scale and kernel_scale. We need to delete it when lowering it to nn.conv2d. * Add input_scale and kernel_scale for qnn.conv2d
Ramana Radhakrishnan committed -
Animesh Jain committed
-
Peter Yeh committed
-
Cody Hao Yu committed
-
* AutoTVM: selecting tuning templates when extracting task Make the procedure of trying new templates easier. Test: tests/python/relay/test_autotvm_task_extraction.py * Use dict to match key for topi ops * fix lint issue * be more pythonic :)
黎明灰烬 committed
-
- 15 Nov, 2019 11 commits
-
-
When we did not set the workgroup size, LLVM will use too many registers for kernel launches with many threads. This resulted in "invalid ISA" errors. Here we set the maximum workgroup size to the maximum threads per block from the device API. Of course, one might look into allowing configurations with fewer threads at runtime to use more registers.
Thomas Viehmann committed -
factors and resulting nested loop is broken. This is due to the fact that we are creating zero extent loops which are fixed afterwards. However unroll pass breaks due to the zero extent loop.
Kimish Patel committed -
[Relay][VM][Interpreter] Enable first-class constructors in VM and interpreter via eta expansion (#4218) * Fix constructor pretty printing * Make Module::HasDef name consistent with API * Add VM constructor compilation via eta expansion * Lint * Fix CI * Fix failing test * Address comment * Retrigger CI * Retrigger CI
Logan Weber committed -
* [COMMUNITY] Add DISCLAIMER, KEYS for ASF release * Add file name spec
Tianqi Chen committed -
T.J. Mercier committed
-
Alex Gladkov committed
-
Zhao Wu committed
-
ziyu-guo committed
-
* bug fix for padded load with large inputs * Update TensorLoad.scala * Update test_vta_insn.py
Liangfu Chen committed -
Jian Weng committed
-
Neo Chien committed
-