- 12 Apr, 2020 2 commits
-
-
* [Intrinsic] Add log1p, ldexp, atan2, hypot, nextafter, copysign * Lint
Junru Shao committed -
Zhi committed
-
- 11 Apr, 2020 1 commit
-
-
* [LLVM] Fix generation of LLVM intrinsics The type list in the call to llvm::Intrinsic::getDeclaration is not the intrinsic's signature, it's the list of overloaded types. Without this fix, the updated unit test would cause the following error: TVMError: LLVM module verification failed with the following errors: Intrinsic name not mangled correctly for type arguments! Should be: llvm.ctlz.i32 i32 (i32, i1)* @llvm.ctlz.i32.i1 Special handling for llvm.prefetch, sig matching for overloaded ints only The prefetch intrinsic returns void in LLVM, while it returns i32 in TVM. This case needs to be handled specially, because rule-based intrinsic translation would cause invalid LLVM type to be created. Do the signature matching only for overloaded intrinsics. It's not needed for non-overloaded ones, so this can save a bit of compile-time. * Include intrinsic name in the error message * Fix number of arguments for llvm.fmuladd and llvm.pow
Krzysztof Parzyszek committed
-
- 10 Apr, 2020 3 commits
-
-
Tianqi Chen committed
-
* Use runtime::String * move string to tvm namespace * add const char* constructor * implicit cast from std::string
Zhi committed -
* [RUNTIME] Initial implementation of Hexagon runtime support This is only the TVM runtime. The FastRPC libraries, simulator driver, etc. will be provided in subsequent commits. * Fix pylint complaints * Fix some more pylint complaints * Add link to the Hexagon SDK website * Extract VTCM marker into a common variable * Implement device->device memory copy * Disable unsigned PDs by default * Ensure that --hvx_length is present in sim_args if HVX is enabled * Remove the line about clang from README.md Apparently things work with libstdc++. * Mention to set USE_RPC=OFF when building libtvm_runtime.so for Hexagon * Remember to use codegen_hvx in validate_hvx_length * Add a line about minimum version of LLVM
Krzysztof Parzyszek committed
-
- 08 Apr, 2020 1 commit
-
-
In newer versions of LLVM, this header is no longer included by one of the already included headers in llvm_common.h, so include it explicitly.
Krzysztof Parzyszek committed
-
- 07 Apr, 2020 3 commits
-
-
This intrinsic was removed in LLVM 11.
Krzysztof Parzyszek committed -
LLVM 11 added support for scalable vectors, and now the number of elements in a vector is represented by a llvm::ElementCount class, not just a number.
Krzysztof Parzyszek committed -
LLVM 11 is introducing a separate class to represent alignment. The functions in IRBuilder that create aligned loads and stores, and which accept the alignment as an unsigned value have been deprecated (and now cause warnings to be emitted).
Krzysztof Parzyszek committed
-
- 05 Apr, 2020 1 commit
-
-
* [REFACTOR][TIR] Migrate all low-level passes to the Pass Manager. This PR migrates the tvm.lower to return IRModule of PrimFuncs instead of the LoweredFuncs. * Remove LoweredFunc.
Tianqi Chen committed
-
- 03 Apr, 2020 2 commits
-
-
* [REFACTOR][TIR] Migrate most of low-level build to use the Pass Manager. - SplitHostDevice - ThreadSync - BindDevice - LowerThreadAllreduce - Provide a temp fix for printing IRModule with PrimFunc before the formal text printer. * Address comments, fix tests. * Fix relay tests * Explicit move
Tianqi Chen committed -
- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>
Wei Pan committed
-
- 02 Apr, 2020 2 commits
-
-
- Migrate LowerTVMBultin - Migrate inferFragment, LowerThreadAllreduce - Migrate ThreadSync - Refactor target::Build to directly take IRModule. - Remove un-used legacy functions.
Tianqi Chen committed -
Haozheng Fan committed
-
- 01 Apr, 2020 1 commit
-
-
* [TIR][TRANSFORM] Migrate LowerIntrin * LowerDeviceStorageAccessInfo * Migrate LowerWarpMemory
Tianqi Chen committed
-
- 26 Mar, 2020 1 commit
-
-
Ruizhe Zhao committed
-
- 24 Mar, 2020 1 commit
-
-
* [REFACTOR][TIR] Introduce PrimFuncPass. - Introduce PrimFuncPass - Convert one pass to the unified Pass API. * Address comments * Fix comments
Tianqi Chen committed
-
- 23 Mar, 2020 1 commit
-
-
* isfinite doc update * isfinit expr * isfinit expr * isfinite schedule reg * isfinite python binding * isfinite python binding * relay register isfinite * isfinite type relation * intrin isfinite * topi isfinite * testcase topi isfinite * tf frontend isfinite * tf frontend isfinite testcase * test case relay isfinite * small fixes * test forward tf isfinite * test cases injective for cuda * remove float16 test case * add support for isinf * remove unwanted import * fix conflict
Mahesh Ambule committed
-
- 22 Mar, 2020 1 commit
-
-
- This allows to emit vectorized loads/stores for CUDA math intrinsics. - A few intrinsics should be lowered as CUDAMath not CUDAFastMath ones. - Fixed the code block identation.
Wei Pan committed
-
- 20 Mar, 2020 1 commit
-
-
As part of the unified IR refactor. This PR refactors the target codegen to use IRModule containing tir::PrimFuncs. In order to break the refactor into several steps without breaking the codebase, we built an conversion pass to convert Array<LoweredFunc> into IRModule. The follow-up refactors will gradually move the passes covered by IRModule up until we cover all the passes. Then we can remove the additional redundant concepts such as LoweredFunc.
Tianqi Chen committed
-
- 18 Mar, 2020 1 commit
-
-
* [CODEGEN][OPENCL] Explicitly cast min/max operands * retrigger CI
MORITA Kazutaka committed
-
- 16 Mar, 2020 1 commit
-
-
* Use dummy func when no lowered_funcs exists in Relay mod * Dummy func -> CSourceModule with empty code str * Added comments describing the empty CSouceModule * Always import external modules w/o assertions * Use CSourceModule as a fallback for LLVMModule * Changed cond for target == llvm * Create an empty LLVM module w/o using dummy func * Avoid using IR str concat to create LLVM module * Improved comments for codegen.LLVMModuleCreate * Satisfy the linter for LLVMModuleCreate
Ruizhe Zhao committed
-
- 11 Mar, 2020 3 commits
-
-
* [intrin] exp2 * [intrin] exp10 * [intrin] log2/10 * [intrins] exp10 * [test] math intrin
Bing Xu committed -
- This patch allows CUDA backend to emit correct code for selects with vector conditions, which may be produced by floordiv op lowering etc.. - This already works for llvm BE, as llvm select instruction supports vector conditions. Signed-off-by: Wei Pan <weip@nvidia.com>
Wei Pan committed -
* Add relay operation relay.op.tan. * Update tan implementation in TVM. * Update tests. * Add shape function for tan. * Add missing main test to python/frontend/tensorflow/test_forward. * Revert, back to sin/cos. * Revert "Revert, back to sin/cos." This reverts commit 4da5b503b921585ba9d80944b29136142b575c40. * Fix implementation of tan in cuda. Do not support tan for float16. Simplify topi/tests/python/test_topi_math. Add testing for tan with float32 and float64. Finally implement tan as sin/cos in llvm.
notoraptor committed
-
- 10 Mar, 2020 1 commit
-
- 09 Mar, 2020 1 commit
-
-
雾雨魔理沙 committed
-
- 06 Mar, 2020 1 commit
-
-
* Add relay operation relay.op.tan. * Update tan implementation in TVM. * Update tests. * Add shape function for tan. * Add missing main test to python/frontend/tensorflow/test_forward. * Revert, back to sin/cos. * Revert "Revert, back to sin/cos." This reverts commit 4da5b503b921585ba9d80944b29136142b575c40. * Fix implementation of tan in cuda. Do not support tan for float16. Simplify topi/tests/python/test_topi_math. Add testing for tan with float32 and float64. Try again to implement tan as sin/cos in llvm.
Yao Wang committed
-
- 25 Feb, 2020 1 commit
-
-
- llvm::StringRef to std::string conversion is explicit now. Signed-off-by: Wei Pan <wpan11nv@nvidia.com>
wpan11nv committed
-
- 21 Feb, 2020 1 commit
-
-
* support cuda tensorcore subbyte int data type in auto tensorcore * add lisence * pass cpplint * fix code review comments * merge the int4/int1 codegen tutorial into the existing auto tensorcore tutorial * using master's new API * disable tuning when cuda is not enabled * address cr comment * do not run the tuning * fix test failure * fix cpplint error * fix bool type reduction bug * 1. fix a index bug 2. fix returned bytes value of int1/int4/uint4 * fix typo
Orion34C committed
-
- 18 Feb, 2020 1 commit
-
-
- Move the related files to tvm.te - Move build_module.py to tvm.driver
Tianqi Chen committed
-
- 16 Feb, 2020 1 commit
-
-
- Do not emit __shared__ etc. as part of type for casting - Fix fp16 reduction kernels with compiler errors: "no operator "+" matches these operands, volatile half + volatile half This patch inserts casts to remove volatile type qualifier following volatile loads (fp16 only). CUDA fp16 library headers should add volatile member functions. - Update have_fp16 to include compute 6.1 GPUs, which do support fp16, although their fp16 throughput is low. Updated tests. Signed-off-by: Wei Pan <weip@nvidia.com>
wpan11nv committed
-
- 13 Feb, 2020 1 commit
-
-
Move the related target modules into tvm.target. API change: - tvm.target.current_target -> tvm.target.Target.current - tvm.datatype -> tvm.target.datatype
tqchen committed
-
- 11 Feb, 2020 1 commit
-
-
hlu1 committed
-
- 07 Feb, 2020 1 commit
-
-
* [REFACTOR][PY-API] Polish tvm.runtime, tvm.runtime.module API update This PR updates the tvm.runtime to use the new FFI style. - Remove top-level tvm.module to avoid confusion between runtime.Module and IRModule - API changes wrt to runtime.Module - tvm.module.load -> tvm.runtime.load_module - tvm.module.enabled -> tvm.runtime.enabled - tvm.module.system_lib -> tvm.runtime.system_lib - Remove dep on api_internal from runtime. * Update module.load in the latest API
Tianqi Chen committed
-
- 04 Feb, 2020 1 commit
-
-
* [LINT] Fix -Wextra * Fix virtual-dtor
Tianqi Chen committed
-
- 21 Jan, 2020 1 commit
-
-
* [REFACTOR] Establish printer in the source folder. As we move towards the unified IR, we will eventually want to build a unified printers for both relay and TIR. This PR isolate the printer component into a separate folder in src as a first step. - Refactored the Doc DSL using Object, clean up APIs. - Isolate out the meta data into a header. - move printer into relay_text_printer, add comments about further TODos. * Rename NodePrinter -> ReprPrinter to distinguish it from other printers
Tianqi Chen committed
-
- 19 Jan, 2020 2 commits
-
-
This PR moves the codegen related code into the target folder, as they are target specific functionalities. We also adopt the term "compiler driver" in common compiler infra such as rust, GHC and clang. As a result, build_module is moved into the driver folder.
Tianqi Chen committed -
TIR is the new namespace for low-level IR for tensor-level optimizations and loop transformations. This PR establishes the namespace and files. - lowered_func.h,buffer.h,data_layout.h -> tir/buffer.h,tir/data_layout.h,tir/lowered_func.h - ir.h -> tir/expr.h, tir/stmt.h - ir_functor_ext.h -> tir/expr_functor.h, tir/stmt_functor.h
Tianqi Chen committed
-