- 04 Apr, 2020 3 commits
-
-
* [ONNX]Pool3d and Upsample3d op updated * Pool3d and Upsample3d testcase * Review comments fixed * Review comments
Samuel committed -
* Fix x86 conv2d and depthwise conv2d auto tuning * Fix depthwise conv2d infer layout * Use random data instead of empty data for autotvm * Fix pylint * Keep empty array for now for autotvm
Yao Wang committed -
* Support mixing normal and cross-thread reduction * minor improvements
Tang, Shizhi committed
-
- 03 Apr, 2020 8 commits
-
-
* [REFACTOR][TIR] Migrate most of low-level build to use the Pass Manager. - SplitHostDevice - ThreadSync - BindDevice - LowerThreadAllreduce - Provide a temp fix for printing IRModule with PrimFunc before the formal text printer. * Address comments, fix tests. * Fix relay tests * Explicit move
Tianqi Chen committed -
Tianqi Chen committed
-
* First pass a defining a non-recursive Graph Vistor and Rewriter autoformat remove a currently empty test until testing is solidfied * Make CalcDep from Dead Code Elimination non-recursive * Partially working, not passing all tests yet passes tests when disabling GetExprRefCount, I think I have a bug in visit counting fix GetExprRefCount Fix a subtle bug with nested recursive/non-recursive scopes * Refactor * improve comments * respond to review comments on comments * Fix a problem with default recursion for dataflow nodes mark DataflowVisitor methods as override * implement ScopeMutator * convert forward_rewrite to ScopeMutator, remove DataflowMutator * rewrite ExprRewriter and convert fast_math to use it * switch BiasAddSimplifier to ExprRewriter fix a clang warning fix cpp lint fix doc param error * respond to review comments * fix a typo in the iterative looping * add a regression test for GetExprRefCount issue * Normalize naming * fix lint * First pass a defining a non-recursive Graph Vistor and Rewriter autoformat remove a currently empty test until testing is solidfied * Make CalcDep from Dead Code Elimination non-recursive * Partially working, not passing all tests yet passes tests when disabling GetExprRefCount, I think I have a bug in visit counting fix GetExprRefCount Fix a subtle bug with nested recursive/non-recursive scopes * Refactor * improve comments * respond to review comments on comments * Fix a problem with default recursion for dataflow nodes mark DataflowVisitor methods as override * implement ScopeMutator * convert forward_rewrite to ScopeMutator, remove DataflowMutator * rewrite ExprRewriter and convert fast_math to use it * switch BiasAddSimplifier to ExprRewriter fix a clang warning fix cpp lint fix doc param error * respond to review comments * fix a typo in the iterative looping * add a regression test for GetExprRefCount issue * Normalize naming * fix lint * respond to review comments
Matthew Brookhart committed -
Animesh Jain committed
-
For certain network topologies, MCR could hang. This patch fixes that case. Change-Id: I3edd8a8a6b452b2b838b777720adea22a3b995b4
mbaret committed -
* [KERAS]upsampling3d and zeropadding3d op * [KERAS]upsampling3d and zeropadding3d test case * Review comments updated
Samuel committed -
Samuel committed
-
- Support vectorized casts - It is incorrect to extract elements from int8x4 with 0x000000ff & (x >> i * 8) as this value is of type int in C/C++. If this expression is used for sign extensions, the sign bit will be wrong. Simply use C style casts instead and sign bits will just work. Signed-off-by: Wei Pan <weip@nvidia.com>
Wei Pan committed
-
- 02 Apr, 2020 12 commits
-
-
Rationale: The current hybrid module is more aligned with the te part. We might consider add a new varient of hybrid script that support the unified IR later. This refactor paves for the potential later changes.
Tianqi Chen committed -
- Reduce CI docs task log size. - Update the relation to halide to the latest state.
Tianqi Chen committed -
* [PYTORCH]AvgPool3d, MaxPool3d and Squeeze op support * Testcases added * review comments
Samuel committed -
- Migrate LowerTVMBultin - Migrate inferFragment, LowerThreadAllreduce - Migrate ThreadSync - Refactor target::Build to directly take IRModule. - Remove un-used legacy functions.
Tianqi Chen committed -
Co-authored-by: Siyuan Feng <hzfengsy@sjtu.edu.cn> This PR introduces BufferLoad/Store to TIR. The new nodes will replace Provide and Call with Tensor arguments in the subsequent refactors.
Tianqi Chen committed -
Haozheng Fan committed
-
* expose runtime::String to Python * retrigger ci
Zhi committed -
* expose runtime::String to Python * kExternalSymbol -> kGlobalSymbol
Zhi committed -
* [Frontend][Torch] Simplify operator input handling * [Frontend][Torch] Allow user supplied input names to override graph inputs * Fix pylint issues * Updates from code review feedback * Fix tutorial to use shape list input * Disable intermittent test failure in topi vision test
Jeremy Johnson committed -
Signed-off-by: Wei Pan <weip@nvidia.com>
Wei Pan committed -
Tianqi Chen committed
-
* [REFACTOR][TIR] Introduce ExprDeepEqual, Remove IRDeepCompare This PR introduces ExprDeepEqual which reuses the StructuralEqual infra. We migrated the usecases of ir_pass::Equal to ExprDeepEqual and StructuralEqual. * Address comments
Tianqi Chen committed
-
- 01 Apr, 2020 8 commits
-
-
* [RELAY] Fixed issues with MergeCompilerRegions This PR addresses a few outstanding issues with the implementation of MergeCompilerRegions. In particular, it now handles TupleGetItem nodes properly and other minor bugs related to region merging have been fixed. Change-Id: I07783afc56183a6f798a510209f23b0a5f252255 * Fixed issue using pre-merged regions Change-Id: I0a844ac59bda1089ae0c67cef52f0b0c7ab2cbd7 * Removed some debugging logic Change-Id: Ib6f2eede6f38bbb270073eb8d4c4dc19f60832c6 * Remove default annotations Change-Id: I9b7696a51c95871491cbea33c40f92ec327e417f * Annotate default 'if's Change-Id: I0098bd1bf6788dd6366810dcefa84f1ebbffaab0 * Clang format Change-Id: I944365cd3080a97a9261f643a8f1efa5a63cf82b * Use src/dest in merge Change-Id: Ie43113492bda8f1ce63eaf9615cb645bb9e2ee86 * Fixed partition test Change-Id: I46f9e349b1a813a9140f7e4f8a2241687e2df73b * Removed comments Change-Id: I309afdd1951d7e796e41d13788aa487707e0ac4c
mbaret committed -
MORITA Kazutaka committed
-
* [PYTORCH]Dropouts And InstanceNorm support added * Review comments fixed
Samuel committed -
* [BUGFIX]bugfix in tensorflow space_to_batch_nd * Test case added
Samuel committed -
* [DOCS] Use https link * use http for sphinx
Tianqi Chen committed -
* [RELAY] Codestyle fixes for Graph Partitioner *ran through clang-format * *formatting comments * *further codestyle changes (after clang-format)
manupa-arm committed -
* [PYTORCH]Activations for pytorch * Review comments updated
Samuel committed -
* [TIR][TRANSFORM] Migrate LowerIntrin * LowerDeviceStorageAccessInfo * Migrate LowerWarpMemory
Tianqi Chen committed
-
- 31 Mar, 2020 9 commits
-
-
Animesh Jain committed
-
* [RELAY] Re-wrote the Graph Partitioner to support multiple outputs Input : A Relay module that have functions with disjoint annotated regions using compiler_begin and compiler_end. There could be multiple outputs. Output : A Relay module with global functions for such disjoint annotated regions with calls inserted at the respective location Dependencies : AnnotatedRegionSet Utility class. Methodology : 1) The AnnotatedRegionSet utility class is able to construct a collection of nodes that are bound by a give annotation -- here we use compiler_begin and compiler_end 2) Initially, for each function in the module AnnotatedRegionSets are populated. 3) Then, Vistor pass is traversed until a compiler_end node is encountered that belongs to a "region". 4) When the first compiler_end of a given annotated region is found, a function is formed and inserted. a) if the region has multiple outputs, a Tuple node (capturing all outputs) is returned. 5) Thereafter, if we encounter an another output of the same annotated region, it is important to note that the function is already formed. Therefore, it will lookup the function and add a TupleGetItemNode is inserted. a) We will use the location index of "rets" of each "Region" of AnnotatedRegionSet as TupleGetItemNode index. 6) Therefore, functions will be created for all annotated regions. The name for each global function is created using "Region" id and the compiler name. Change-Id: I1372f02a845b6d3da03b561763e03a378dca263c * [RELAY] Re-wrote the Graph Partitioner to support multiple outputs *removed the expected use-case as we are taking broken-down PR approach *code style fixes *some trivial one liners * [RELAY] Re-wrote the Graph Partitioner to support multiple outputs *fixed an implicit copy to a move * [RELAY] Re-wrote the Graph Partitioner to support multiple outputs *code style changes for comments *renamed test case multiple outputs --> mixed single multiple outputs Since the existing test case checks for both single and multiple output scenarios *added a new test case with conv2d + batch_norm *some var name changes in the test * [RELAY] Re-wrote the Graph Partitioner to support multiple outputs *rebased
manupa-arm committed -
Thomas Viehmann committed
-
* [Torch] Add support for split * fix * fix test class
Wang Yucheng committed -
* [KERAS]Pool3d support added * Keras pool3d testcase added
Samuel committed -
* refactor * path udpate
Thierry Moreau committed -
Andrew Reusch committed
-
hlu1 committed
-
* [TE] reverse-mode autodiff without any optimization Co-authored-by: Sergei Grechanik <sergei.grechanik+h@gmail.com> * address review comments * add comments and retrigger CI * move unittest to debug ci * move test back and add seed Co-authored-by: Sergei Grechanik <sergei.grechanik+h@gmail.com>
Yizhi Liu committed
-