- 31 Jan, 2020 1 commit
-
-
Animesh Jain committed
-
- 30 Jan, 2020 4 commits
-
-
masahi committed
-
jmorrill committed
-
Add support for: greater_equal, less, less_equal, equal, not_equal Add tests for the elemwise relational ops
Ina Dobreva committed -
abergeron committed
-
- 29 Jan, 2020 2 commits
- 28 Jan, 2020 2 commits
-
-
* Implement pass tracing API * Set is_before correctly * Add docs for trace function * Fix lint * Remove PDB * Ensure trace_func is set before calling * Fix conditional
Jared Roesch committed -
Cody Yu committed
-
- 27 Jan, 2020 4 commits
-
-
* ONNX frontend broadcast condition * fix * fix style Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed -
Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed -
* Explicitly link to cublasLt * Only link cublasLt if it's found Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed -
fixed a spelling mistake.
Kaiyan Chang committed
-
- 25 Jan, 2020 1 commit
-
-
HUAN-PING SU committed
-
- 24 Jan, 2020 5 commits
-
-
* fix formula for calculating end indices when size[i] == -1 * add a test case for size[i] == -1 * discard expanding dimension of begin_value & end_value since it is needed only if you pass them as scalars not as tensors. * discard 'slice_tensor' variable so that implementation matches the tf parser pattern
Ina Dobreva committed -
masahi committed
-
* remove cpp upsampling * remove cpp resize
masahi committed -
Alex Gladkov committed
-
hlu1 committed
-
- 23 Jan, 2020 2 commits
-
-
* [VTA] Support network which have no unique operator as start/stop name for graph pack. [Issue] Current vta use 'start' and 'stop' name to define the pack start point and end point, but this method not work for these network which have no 2 unique operator as start point and stop point. [Solution] In this solution we give 2 addtional parameters start_name_indx and stop_name_indx to make vta pack logic work with the said network, for exampl for following networks which have no unique operator, %0 = nn.add %1 = nn.conv2d %2 = nn.batch_norm %3 = nn.leaky_relu %4 = nn.add %5 = nn.conv2d %6 = nn.batch_norm %7 = nn.leaky_relu %8 = nn.add with this solution we can use following parameter format to make vta work on it. relay_prog = graph_pack( //.... start_name="nn.add", stop_name="nn.add", start_name_idx=0, stop_name_idx=4) to apply on new network, by printing the network we can get index information like following. print(mod.astext(show_meta_data=False)) relay_prog = graph_pack(mod ... start_name="nn.add", stop_name="nn.add", start_name_idx=0, stop_name_idx=4) * address review comments and fix index count bug issue: when do print(mod), the output not only the Call is also have other type like Var, need add logic to count all except meta. solution: add related logic * address review comments. * address review comments * add more detail comments.
Hua Jiang committed -
Alexander Pivovarov committed
-
- 22 Jan, 2020 4 commits
-
-
- combine pad and dilate; - fix for the issue https://discuss.tvm.ai/t/compile-error-for-cuda-target/4164 - fix for the issue https://github.com/apache/incubator-tvm/pull/4472
Alex Gladkov committed -
Alexander Pivovarov committed
-
Alexander Pivovarov committed
-
"driver" normally refers to the "main" function. Rationale: the header exposes set of APIs to drive compilation and should be named as driver api to best reflect its usage.
Tianqi Chen committed
-
- 21 Jan, 2020 4 commits
-
-
* BYOC Tutorial -- part 2 * Fix comments * Address comments
Cody Yu committed -
Tianqi Chen committed
-
Bring up namespace te -- Tensor expression language DSL.
Tianqi Chen committed -
* [REFACTOR] Establish printer in the source folder. As we move towards the unified IR, we will eventually want to build a unified printers for both relay and TIR. This PR isolate the printer component into a separate folder in src as a first step. - Refactored the Doc DSL using Object, clean up APIs. - Isolate out the meta data into a header. - move printer into relay_text_printer, add comments about further TODos. * Rename NodePrinter -> ReprPrinter to distinguish it from other printers
Tianqi Chen committed
-
- 20 Jan, 2020 3 commits
-
-
* expose BindParamByName to python * fixed alpha equal test
masahi committed -
* [REFACTOR][TYPE] Finish move all types to IR. - Move definition of Ref and TensorType to ir - Move type_functor.h to public header. - Rename RefType -> RelayRefType for clarity. * Add atol
Tianqi Chen committed -
Alex Gladkov committed
-
- 19 Jan, 2020 3 commits
-
-
This PR moves the codegen related code into the target folder, as they are target specific functionalities. We also adopt the term "compiler driver" in common compiler infra such as rust, GHC and clang. As a result, build_module is moved into the driver folder.
Tianqi Chen committed -
HUAN-PING SU committed
-
TIR is the new namespace for low-level IR for tensor-level optimizations and loop transformations. This PR establishes the namespace and files. - lowered_func.h,buffer.h,data_layout.h -> tir/buffer.h,tir/data_layout.h,tir/lowered_func.h - ir.h -> tir/expr.h, tir/stmt.h - ir_functor_ext.h -> tir/expr_functor.h, tir/stmt_functor.h
Tianqi Chen committed
-
- 18 Jan, 2020 3 commits
-
-
Haichen Shen committed
-
* unify vm and interpreter objects * move closure back vm * adt/closure back to vm.adt/vm.closure * closure base
Zhi committed -
- Fixes issues to enable fp16 vectorizer. Now correct packing and unpacking CUDA code will be emitted. Enabled more unit tests. - Do not emit code to read the first lane from an undef variable int _3; _3 = _3 & ~(0x000000ff << 0) | ... and emit the following code instead: _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0); Note that nvcc 10.2 is forgiving and emits the same code for both cases. A warning appears in test_codegen_cuda.py. Signed-off-by: Wei Pan <weip@nvidia.com>
wpan11nv committed
-
- 17 Jan, 2020 2 commits
-
-
* Update task_python_vta.sh * install sbt=1.1.1 with apt-get * update verilator_opt * install verilator with major version 4.0 * disable multi-threading for now * bug fix for correcting uop fetch address in LoadUop module * bug fix for correcting uop fetch address in LoadUop module * adjustment to read from dram_offset * enable USE_THREADS with verilator 4.x * DEBUG: try avoid core dump with verilator 4.x * bug fix in LoadUop module * log mega cycles in tsim * download cat.png to avoid fetching in each run * bug fix in LoadUop module * solve dram_even/sram_even issue * bug fix * introduce scalalint in ci * speedup tsim in ci * bug fix * lint scala code before building * disable multi-threading * split fsim/tsim script * update Jenkins settings * duplicate task_python_vta_fsim.sh as task_python_vta.sh for now Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
Liangfu Chen committed -
Move the conversion extensions to the specific class definitions so that we longer need to include packed_func_ext.
Tianqi Chen committed
-