- 27 Jan, 2020 2 commits
-
-
* Explicitly link to cublasLt * Only link cublasLt if it's found Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed -
fixed a spelling mistake.
Kaiyan Chang committed
-
- 25 Jan, 2020 1 commit
-
-
HUAN-PING SU committed
-
- 24 Jan, 2020 5 commits
-
-
* fix formula for calculating end indices when size[i] == -1 * add a test case for size[i] == -1 * discard expanding dimension of begin_value & end_value since it is needed only if you pass them as scalars not as tensors. * discard 'slice_tensor' variable so that implementation matches the tf parser pattern
Ina Dobreva committed -
masahi committed
-
* remove cpp upsampling * remove cpp resize
masahi committed -
Alex Gladkov committed
-
hlu1 committed
-
- 23 Jan, 2020 2 commits
-
-
* [VTA] Support network which have no unique operator as start/stop name for graph pack. [Issue] Current vta use 'start' and 'stop' name to define the pack start point and end point, but this method not work for these network which have no 2 unique operator as start point and stop point. [Solution] In this solution we give 2 addtional parameters start_name_indx and stop_name_indx to make vta pack logic work with the said network, for exampl for following networks which have no unique operator, %0 = nn.add %1 = nn.conv2d %2 = nn.batch_norm %3 = nn.leaky_relu %4 = nn.add %5 = nn.conv2d %6 = nn.batch_norm %7 = nn.leaky_relu %8 = nn.add with this solution we can use following parameter format to make vta work on it. relay_prog = graph_pack( //.... start_name="nn.add", stop_name="nn.add", start_name_idx=0, stop_name_idx=4) to apply on new network, by printing the network we can get index information like following. print(mod.astext(show_meta_data=False)) relay_prog = graph_pack(mod ... start_name="nn.add", stop_name="nn.add", start_name_idx=0, stop_name_idx=4) * address review comments and fix index count bug issue: when do print(mod), the output not only the Call is also have other type like Var, need add logic to count all except meta. solution: add related logic * address review comments. * address review comments * add more detail comments.
Hua Jiang committed -
Alexander Pivovarov committed
-
- 22 Jan, 2020 4 commits
-
-
- combine pad and dilate; - fix for the issue https://discuss.tvm.ai/t/compile-error-for-cuda-target/4164 - fix for the issue https://github.com/apache/incubator-tvm/pull/4472
Alex Gladkov committed -
Alexander Pivovarov committed
-
Alexander Pivovarov committed
-
"driver" normally refers to the "main" function. Rationale: the header exposes set of APIs to drive compilation and should be named as driver api to best reflect its usage.
Tianqi Chen committed
-
- 21 Jan, 2020 4 commits
-
-
* BYOC Tutorial -- part 2 * Fix comments * Address comments
Cody Yu committed -
Tianqi Chen committed
-
Bring up namespace te -- Tensor expression language DSL.
Tianqi Chen committed -
* [REFACTOR] Establish printer in the source folder. As we move towards the unified IR, we will eventually want to build a unified printers for both relay and TIR. This PR isolate the printer component into a separate folder in src as a first step. - Refactored the Doc DSL using Object, clean up APIs. - Isolate out the meta data into a header. - move printer into relay_text_printer, add comments about further TODos. * Rename NodePrinter -> ReprPrinter to distinguish it from other printers
Tianqi Chen committed
-
- 20 Jan, 2020 3 commits
-
-
* expose BindParamByName to python * fixed alpha equal test
masahi committed -
* [REFACTOR][TYPE] Finish move all types to IR. - Move definition of Ref and TensorType to ir - Move type_functor.h to public header. - Rename RefType -> RelayRefType for clarity. * Add atol
Tianqi Chen committed -
Alex Gladkov committed
-
- 19 Jan, 2020 3 commits
-
-
This PR moves the codegen related code into the target folder, as they are target specific functionalities. We also adopt the term "compiler driver" in common compiler infra such as rust, GHC and clang. As a result, build_module is moved into the driver folder.
Tianqi Chen committed -
HUAN-PING SU committed
-
TIR is the new namespace for low-level IR for tensor-level optimizations and loop transformations. This PR establishes the namespace and files. - lowered_func.h,buffer.h,data_layout.h -> tir/buffer.h,tir/data_layout.h,tir/lowered_func.h - ir.h -> tir/expr.h, tir/stmt.h - ir_functor_ext.h -> tir/expr_functor.h, tir/stmt_functor.h
Tianqi Chen committed
-
- 18 Jan, 2020 3 commits
-
-
Haichen Shen committed
-
* unify vm and interpreter objects * move closure back vm * adt/closure back to vm.adt/vm.closure * closure base
Zhi committed -
- Fixes issues to enable fp16 vectorizer. Now correct packing and unpacking CUDA code will be emitted. Enabled more unit tests. - Do not emit code to read the first lane from an undef variable int _3; _3 = _3 & ~(0x000000ff << 0) | ... and emit the following code instead: _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0); Note that nvcc 10.2 is forgiving and emits the same code for both cases. A warning appears in test_codegen_cuda.py. Signed-off-by: Wei Pan <weip@nvidia.com>
wpan11nv committed
-
- 17 Jan, 2020 9 commits
-
-
* Update task_python_vta.sh * install sbt=1.1.1 with apt-get * update verilator_opt * install verilator with major version 4.0 * disable multi-threading for now * bug fix for correcting uop fetch address in LoadUop module * bug fix for correcting uop fetch address in LoadUop module * adjustment to read from dram_offset * enable USE_THREADS with verilator 4.x * DEBUG: try avoid core dump with verilator 4.x * bug fix in LoadUop module * log mega cycles in tsim * download cat.png to avoid fetching in each run * bug fix in LoadUop module * solve dram_even/sram_even issue * bug fix * introduce scalalint in ci * speedup tsim in ci * bug fix * lint scala code before building * disable multi-threading * split fsim/tsim script * update Jenkins settings * duplicate task_python_vta_fsim.sh as task_python_vta.sh for now Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>
Liangfu Chen committed -
Move the conversion extensions to the specific class definitions so that we longer need to include packed_func_ext.
Tianqi Chen committed -
Animesh Jain committed
-
During Unified IR refactor we will change the structure of IRs. This will cause certain historical modules stored via json no longer able to be loaded by the current version. This PR introduces a backward compatible layer to try its best effort to upgrade json from previous version(this case 0.6) to the current version. We mainly aim to support update of high-level ir(relay).
Tianqi Chen committed -
* [QNN] Conv2D type checking for kernel per-channel scales. * Address commments. * Address comments. * - Adding safety checks for downcasts. Co-authored-by: shoubhik <shoubhikbhatti@gmail.com>
Animesh Jain committed -
* [VTA] Update Jenkinsfile for VTA test with TSIM * duplicate task_python_vta.sh multiple copies for now
Liangfu Chen committed -
vexilligera committed
-
hlu1 committed
-
- Remove operator bool from base object ref macro - Raitionale: operator bool can be dangerous for sub-classes that also overloads other operators(e.g. ==). - If bool is still needed, use explicit operator bool. - Use absolute include when necessary - Move type related util to data_type - Isolate stackvm code from compiler
Tianqi Chen committed
-
- 16 Jan, 2020 4 commits
-
-
* [Docs] Convert Layout pass. * Address comments. Section 3 massaging. * Address comments.
Animesh Jain committed -
* [REFACTOR] introduce top - Tensor Operation DSL. Historically we put Tensor, Schedule and compute under the root tvm namespace. This is no longer a good idea as the project's scope grows larger than the tensor operation DSL. This PR introduces top -- a namespace for tensor operational DSL concepts such as schedule, tensor, compute. We moved the related files to the new top subfolder. * Move relevant files into include/tvm/top and src/top
Tianqi Chen committed -
* BYOC tutorial: codegen C * Address comments * Address comments * Add build option * Address comments * Use TVM_DLL_EXPORT_TYPED_FUNC
Cody Yu committed -
Thierry Moreau committed
-