- 25 Feb, 2020 5 commits
-
-
- llvm::StringRef to std::string conversion is explicit now. Signed-off-by: Wei Pan <wpan11nv@nvidia.com>
wpan11nv committed -
* Support int args and no extra buffers * Fixes * remove testing code * fix style * more style * use const args * style Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed -
* Add a PyTorch to Relay parser * Add alexnet, googlenet, mnasnet, shufflenet wip * Fix lint * Remove fix for shufflenet * Lower check * Pull changes from neo-ai/tvm changes * Remove commented out section * Use infer_shape everywhere * Change back to using trace instead of path in from_pytorch * Parse state_dict to add param names * Umbrella single_op under test_forwards * Remove print and cleanup call * Check if update to test broke CI * Retrigger CI * Add back in updated tests * Try splitting up tests * First pass at flexible typing, implemented for ones * Add int32 for all ops * Remove print statements * Fix lint * Broad except * Add other tensor types * Temporarily use old tests * Retrigger CI * Lower type names * Use numpy to convert in dense op * Fix lint * Remove print * Need to cleanup but verify int32 works for add * Rough tests for different types, a lot of types are not supported on CPU * Probably doesn't build, need to save work as I have to switch branches (constantly) * Parse param type * Remove print stmt in parser * Clean up some code * Working on flaot32 for bn * Add resnet18 double type * Fix lint * Temporarily move PT tests first * Temporarily add back refactored tests to fix mem issue * Add more type test and temp remove some tests * Comment out tests, hopefully CI prints a trace * Get stack trace * Remove operator dict key, rename op_name to node_id, remove dead code * Make relay map a list * Remove some hacky string stuff * Move to PyTorch 1.4 * Remove input_type as param * Remove _get_fill_value, fix full ops * Remove unused code and combine ops for identity and none * Remove fn_param * Clean up main loop * Remove useless if/else for outputs * Remove ir_names, only used once * Remove some string hacking * Remove string parsing to get output name * Fix bug with output sizes of nodes * Use attributeNames in parse ops * Remove continue and add_op in parse_op * Do this everywhere, use assert instead of explciitly type casting * Remove unnecessary swap * Slight refactor for elemwise input parse * Use a copy of graph everywhere * Rename nid_to_node_name * Refactor parse import prereqs * Clean up input node kind check * Clean up conditionals * Clean up add_op * Cleanup type for ones and zeros op * Fix lint * Add torch install to CI * Actually use torch * Try moving import torch to only where it's needed * Import torch for CI * Use take op for select * Temporarily add ignore for jit inline pass for CI * Use CompleteTensorType, might be a PT 1.2 only thing * Use different types in elemwise op * Use float16 ones * Fix float16 test * Remove the temp docker changes * Remove temp test * Temporarily comment out original tests * Remove file * Empty cache after each test * Add some prints and lower input sizes * Try using no grad * Trying to globally set grad off * Use no grad for torchvision * Remove xfail tests * Remove VGG and AlexNet due to some issues * Combine pooling tests * Remove extra test file * Remove single op, remove larger pooling tests * Remove maxpool3 * Remove debug prints * Remove inference call and add no_grad in measure latency * Use standard string start char * Remove redundant infer_shape in slice * Convert most to checks to just expr * Remove extra paren * More refactor of isinstance * Add helper for creating typed constants * Assert instead of return when no matching type * Remove network variants * Add no_grad when forward, remove deatch, fix lint * Change isinstance to expr in transpose * Use opnotimplemented, refactor * Fix full ops, remove duplicate tests * Never use shape field unless we know the type * Remove comma, retrigger CI * Add paren, retrigger CI * Use inline if-else for flags * Throw exception instead of assert * Remove version check for CI * Check version when doing inline pass * Fix lint * Lower more input sizes * Add new line, conv2d only accepts weight as expr * Use tvm.runtime.ndarray * Remove change to torch version install * Try no grad for mobilenet * Fix lint * Fix lint again * Revert to last passing * Delete test files * Ignore lint * Revert back * Comment out mobilenet * Clean up compare compiled and baseline outputs * Use IRModule * Add todos * Refactor use_bias * Add todo for fix conv op channels * Change input to data type * Remove todo * Handle channel multiplier > 1
Alex Wong committed -
* Use opencv reisze method for preprocessing of image in darknet * Use opencv reisze method for preprocessing of image in darknet * Fix pylint issues
vizero1 committed -
GaussianDropout & GaussianNoise are active only during training time. This can be skipped during inference.
Samuel committed
-
- 24 Feb, 2020 1 commit
-
-
* relay op strategy fix lint bitpack strategy bitserial_dense (#6) * update strategy * address comments fix a few topi test Dense strategy (#5) * dense * add biforst; remove comments * address comment Refactor x86 conv2d_NCHWc (#4) * Refactor x86 conv2d * Add x86 depthwise_conv2d_NCHWc * Add back topi x86 conv2d_nchw * Merge x86 conv2d_nchw and conv2d_NCHWc * Minor fix for x86 conv2d fix more strategy Add x86 conv2d_NCHWc_int8 strategy (#8) * Add x86 conv2d_NCHWc_int8 strategy * Remove contrib_conv2d_nchwc_int8 * Fix generic conv2d_NCHWc for int8 * Fix topi arm_cpu conv2d_NCHWc_int8 update x86 conv2d enable specify relay ops to be tuned for autotvm add cuda conv2d strategy add conv2d strategy for rocm add conv2d strategy for hls add conv2d strategy for arm cpu add conv2d strategy for mali add conv2d strategy for bifrost add conv2d strategy for intel graphics clean up and fix lint remove template keys from autotvm remove 2 in the func name address comments fix * fix bugs * lint * address comments * add name to op implement * Modify topi tests (#9) * Add pooling, reorg, softmax and vision * Add lrn * fix topi test * fix more topi test * lint * address comments * x * fix more tests & bugs * Modify more tests (#10) * Modify tests for bitserial_conv2d, bitserial_dense, bitserial_conv2d_rasp and bnn * Minor fix * More minor fix * fix more test * try to update vta using strategy * fix cpptest * x * fix rebase err * Fix two tests (#11) * change autotvm log format * lint * minor fix * try fix vta test * fix rebase err * tweak * tmp hack for vta pass * fix tutorial * fix * fix more tutorials * fix vta tutorial * minor * address comments * fix * address comments * fix cpptest * fix docs * change data structure name and api * address comments * lint * fix rebase err * updates * fix winograd test * fix doc * rebase * upgrade tophub version number * fix bug * re-enable vta tsim test after tophub is upgraded * fix vta test to use the correct args so the config can be found in tophub Co-authored-by: Yao Wang <kevinthesunwy@gmail.com>
Haichen Shen committed
-
- 21 Feb, 2020 5 commits
-
-
* get_valid_count accuracy issue fixed for individual tests but not for all tests running together * minor fix * initialize valid_count and PrefixSum buffers * test updated * udpate relay test as well * update document * fix lint * address comment * fix lint * correct atomicAdd identifier name
Leyuan Wang committed -
* [TEST][FLAKY] topi/tests/python/test_topi_sort.py::test_argsort * upadate test function of argsort like topk * Shuffle index and get data from shuffled index * Replace the random.uniform with np.arange
Neo Chien committed -
Tianqi Chen committed
-
* add TFLite version check for 'ceil' and 'cos' * fix name check of test_op for positive inputs * add error message for operator not found in the installed fbs schema
Ina Dobreva committed -
* support cuda tensorcore subbyte int data type in auto tensorcore * add lisence * pass cpplint * fix code review comments * merge the int4/int1 codegen tutorial into the existing auto tensorcore tutorial * using master's new API * disable tuning when cuda is not enabled * address cr comment * do not run the tuning * fix test failure * fix cpplint error * fix bool type reduction bug * 1. fix a index bug 2. fix returned bytes value of int1/int4/uint4 * fix typo
Orion34C committed
-
- 20 Feb, 2020 3 commits
- 19 Feb, 2020 3 commits
-
-
* [REFACTOR] Polish ffi convention. - Remove the src/api, keep registration local to the c++ function. - Remove the api_internal as it is no longer needed. * Update the codebase walk through
Tianqi Chen committed -
hcyang committed
-
Andrew committed
-
- 18 Feb, 2020 8 commits
-
-
Tianqi Chen committed
-
* [Relay] Expose FunctionGetAttr to Python * add test Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed -
* Basic test working * Almost all tests working. * all tests passing. * Fixed lint. * Improved Style.
Josh Fromm committed -
Tianqi Chen committed
-
Tianqi Chen committed
-
Tianqi Chen committed
-
Fixed bugs that occured when using bitwise operators on floating point type expressions. Further crash when using ops <<, >>, %. Finally added regression tests for both types of bug. (#4892)
pankratz committed -
- Move the related files to tvm.te - Move build_module.py to tvm.driver
Tianqi Chen committed
-
- 17 Feb, 2020 5 commits
-
-
* Fix bug in re-processing call node * Add test * Add to main * temp changes to work from another machine * fix rest of tests * fix test_reuse_call_merge * fix merge Co-authored-by: Jon Soifer <jonso@microsoft.com>
Jon Soifer committed -
Tianqi Chen committed
-
Alex Gladkov committed
-
various minor editorial updates - style, grammar, typos.
Baden Hughes committed -
Zhi committed
-
- 16 Feb, 2020 3 commits
-
-
Tianqi Chen committed
-
* add additional switch to handle nested call node * Fix VM compiler for while loop with free var
masahi committed -
- Do not emit __shared__ etc. as part of type for casting - Fix fp16 reduction kernels with compiler errors: "no operator "+" matches these operands, volatile half + volatile half This patch inserts casts to remove volatile type qualifier following volatile loads (fp16 only). CUDA fp16 library headers should add volatile member functions. - Update have_fp16 to include compute 6.1 GPUs, which do support fp16, although their fp16 throughput is low. Updated tests. Signed-off-by: Wei Pan <weip@nvidia.com>
wpan11nv committed
-
- 15 Feb, 2020 3 commits
- 14 Feb, 2020 3 commits
-
-
* [QNN] Doc fix on quantize and convolution * update test
masahi committed -
- This allows to better utilize the memory bandwidth - Note that not all cases are vectorized for fp16 datatype. For instance, when the size is not a multiple of 1024, the inner loop may be an expression that cannot be vectorized. In this case, a small inner loop is still benefical for latency hidding. Signed-off-by: Wei Pan <weip@nvidia.com>
wpan11nv committed -
- Move related files into the corresponding location as in C++ - Keep the top-level TVM API backward compatible to make minimum changes in topi
tqchen committed
-
- 13 Feb, 2020 1 commit
-
-
Co-Authored-By: Wei Chen <ipondering.weic@gmail.com>
Zhi committed
-