1. 14 Oct, 2019 1 commit
  2. 25 Sep, 2019 3 commits
    • Added tesnorizeation for avx2 based gemm. (#3982) · 23727eb4
      * Added tesnorizeation for avx2 based gemm.
      
      Summary:
      Tensorized the same region as avx512. Names produce 16x1 int32 results.
      Does by doing two sets of AVX2 instructions to do reduction on 8x4 int8
      kernel with 1x4 data.
      
      Test Plan:
      on avx2 machine:
      python tests/python/contrib/test_gemm_avx2_acc32.py
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Fix lint errors. Removed commented out code.
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      Kimish Patel committed
    • Changes to make tensorize work. These changes also fix the previously broken test. (#3981) · b410df8c
      * Changes to make tensorize work. These changes also fix the previously
      broken test.
      
      Summary:
      Tensorize was breaking  for a few reasons.
      1)
      Assert at: src/op/tensorize.cc:234 CHECK(is_one(e.region[j]->extent))
      In some cases this cannot be proven, e.g.:
      expected shape=[16, 4], given region=[range(min=((ax1.outer*16)/16), ext=(((((ax1.outer*16) + 15)/16) + 1) - ax1.outer)), range(min=((k.outer*4)/4), ext=(((((k.outer*4) + 3)/4) + 1) - k.outer)), range(min=0, ext=16), range(min=0, ext=4)]
      The unprovable one is: ext=(((((ax1.outer*16) + 15)/16) + 1) - ax1.outer)).
      This can be simplified but it is not because to simplify divide, it must
      prove ax1.outer > 0 and since it is var it cannot. The fix for this to
      just find all the vars in expr in relace them with some const value.
      
      2) Equivalence between tensorized expr and one being asked to tensorize. For example,
      the error would be.
      TVMError: Check failed: Equal(lhs, rhs):
      Failed to match the compute with TensorIntrin tensor_intrin's declaration
      provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[(int16)0]), source=[(int16(data(k))*int16(kernel(((((((((k.outer.outer*64) + (k.outer.inner*2)) + k)/2)*128) + i) - (k.outer.inner*128)) - (k.outer.outer*4096)), ((((k.outer.outer*64) + (k.outer.inner*2)) + k) % 2))))], axis=[iter_var(k, range(min=0, ext=2))], where=(bool)1, value_index=0),
      intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[(int16)0]), source=[(int16(data(k))*int16(kernel(i, k)))], axis=[iter_var(k, range(min=0, ext=2))], where=(bool)1, value_index=0)
      Difference is mainly in the source part:
      source=[(int16(data(k))*int16(kernel(((((((((k.outer.outer*64) + (k.outer.inner*2)) + k)/2)*128) + i) - (k.outer.inner*128)) - (k.outer.outer*4096)), ((((k.outer.outer*64) + (k.outer.inner*2)) + k) % 2))))]
      source=[(int16(data(k))*int16(kernel(i, k)))], axis=[iter_var(k, range(min=0, ext=2))]
      This was not being simpifiled due to compute_intrin_iter_space (map for
      iter var to range) not containing leaf iter vars.
      
      3) Here it fails with:
      Check failed: is_one(Simplify(value->shape[i])): Argument b_buffer shape mismatch[16, 4] vs [(((((ax1.outer*16) + 15)/16) + 1) - ax1.outer), (((((k.outer*4) + 3)/4) + 1) - k.outer), 16, 4]
      This is in buffer binding where it thinks expected and buffer bound
      shape is different. Although if we could simplify expr, this would not
      be the case.
      
      Test Plan:
      On skylake avx512 machine:
      python tests/python/contrib/test_gemm_acc16.py
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Implemented bounded analyzer which traverses tree and for reduce/for
      statements binds the bound of the analyzer. Later this is used to
      simplify expressions. Inspired from ir_mutator_with_analyzer
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Addressed comments.
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Added ASF header + define macro for the header file: TVM_ARITHMETIC_IR_VISITOR_WITH_ANALYZER_H_
      Some lint fixes as well.
      
      * Relax the assumption that dom_map must always contain all leaf itervars.
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Disable copy constructor and move to raw ptr.
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      Kimish Patel committed
  3. 15 Sep, 2019 1 commit
  4. 13 Sep, 2019 1 commit
  5. 12 Sep, 2019 1 commit
  6. 08 Sep, 2019 1 commit
  7. 12 Aug, 2019 1 commit
  8. 25 Jul, 2019 1 commit
    • Implementation of uTVM (#3227) · ef909df1
      * uTVM interfaces (#14)
      
      * some minor interface changes
      
      * implemented HostLowLevelDevice
      
      * added MicroDeviceAPI
      
      * implemented micro_common and added Python interfaces
      
      * current status, semi implemented micro session
      
      * added micro_common implementation and python interfaces (#18)
      
      * added micro_common implementation and python interfaces (#18)
      
      * current status, semi implemented
      
      * host test working
      
      * updated interfaces for MicroSession arguments allocation
      
      * make somewhat lint compatible
      
      * fix based on comments
      
      * added rounding macro
      
      * fix minor bug
      
      * improvements based on comments
      
      * Clean up `binutil.py` and make Python-3-compatible
      
      * Change argument allocation design
      
      * Address feedback and lint errors
      
      * Improve binutil tests
      
      * Simplify allocator (per @tqchen's suggestions)
      
      * Doc/style fixes
      
      * farts
      
      * mcgee
      
      * rodata section werks
      
      (and so does `test_runtime_micro_workspace.py`)
      
      * simple graph runtime werk
      
      * TEMP
      
      * ResNet works, yo
      
      * First round of cleanup
      
      * More cleanup
      
      * runs a dyson over the code
      
      * Another pass
      
      * Fix `make lint` issues
      
      * ready to pr... probably
      
      * final
      
      * Undo change
      
      * Fix rebase resolution
      
      * Minor fixes
      
      * Undo changes to C codegen tests
      
      * Add `obj_path` in `create_micro_lib`
      
      * TEMP
      
      * Address feedback
      
      * Add missing TODO
      
      * Partially address feedback
      
      * Fix headers
      
      * Switch to enum class for `SectionKind`
      
      * Add missing ASF header
      
      * Fix lint
      
      * Fix lint again
      
      * Fix lint
      
      * Kill lint warnings
      
      * Address feedback
      
      * Change Python interface to MicroTVM
      
      All interaction with the device is now through `Session` objects, which
      are used through Python's `with` blocks.
      
      * Reorder LowLevelDevice interface
      
      * Store shared ptr to session in all alloced objects
      
      * Move helper functions out of `tvm.micro`
      
      * Switch static char arr to vector
      
      * Improve general infra and code quality
      
      Does not yet address all of tqchen's feedback
      
      * Forgot a rename
      
      * Fix lint
      
      * Add ASF header
      
      * Fix lint
      
      * Partially address MarisaKirisame's feedback
      
      * Lint
      
      * Expose `MicroSession` as a node to Python
      
      * Revert to using `Session` constructor
      
      * Fix compiler error
      
      * (Maybe) fix CI error
      
      * Debugging
      
      * Remove
      
      * Quell lint
      
      * Switch to stack-based session contexts
      
      * Make uTVM less intrusive to host codegen
      
      And use SSA for operands of generated ternary operators
      
      * Inline UTVMArgs into UTVMTask struct
      
      * Remove `HostLowLevelDevice` header
      
      * Remove `BaseAddr` class
      
      * Address feedback
      
      * Add "utvm" prefix to global vars in runtime
      
      * Fix lint
      
      * Fix CI
      
      * Fix `test_binutil.py`
      
      * Fix submodules
      
      * Remove ResNet tests
      
      * Make `test_binutil.py` work with nose
      
      * Fix CI
      
      * I swear this actually fixes the binutil tests
      
      * lint
      
      * lint
      
      * Add fcompile-compatible cross-compile func
      
      * Add docs for uTVM runtime files
      
      * Move pointer patching into `MicroSession`
      
      * Fix lint
      
      * First attempt at unifying cross-compile APIs
      
      * Fix lint
      
      * Rename `cross_compile` back to `cc`
      
      * Address feedback
      
      * Remove commented code
      
      * Lint
      
      * Figure out failing function
      
      * Remove debugging code
      
      * Change "micro_dev" target to "micro"
      
      * Add checks in tests for whether uTVM is enabled
      
      * Add TODO for 32-bit support
      
      * Rename more "micro_dev" to "micro"
      
      * Undo rename
      
      We already have `tvm.micro` as a namespace.  Can't have it as a method
      as well.
      
      * Fix failing CI
      
      Thanks to @tqchen for finding this bug.  Emitting ternary operators for
      `min` and `max` causes concurrency bugs in CUDA, so we're moving the
      ternary op emissions from `CodeGenC` to `CodeGenCHost`.
      
      * Address feedback
      
      * Fix lint
      Logan Weber committed
  9. 21 May, 2019 1 commit
  10. 16 May, 2019 1 commit
  11. 29 Apr, 2019 1 commit
    • [Relay][TOPI] Gluncv SSD support on the GPU (#2784) · a706ad16
      * ssd gluoncv gpu op updated
      
      * ssd gluoncv gpu op updated
      
      * tutorials and testes modified
      
      * tutorials and testes modified
      
      * fix lint
      
      * fix lint
      
      * address comment
      
      * multibox bug fixed
      
      * space line added
      
      * use less threads per block
      
      * use less threads per block
      
      * less threads per block for get valid count
      
      * less threads per block for get valid count
      
      * merge with master
      
      * Revert "less threads per block for get valid count"
      
      This reverts commit 08896cfccc34b0b2a1646d01d01ea4cad73941c4.
      
      * Revert "less threads per block for get valid count"
      
      This reverts commit 08896cfccc34b0b2a1646d01d01ea4cad73941c4.
      
      * typo fixed
      
      * elem length made to a variable
      
      * fix lint error
      
      * fix lint error
      
      * lint fixed
      
      * bug fixed
      
      * bug fixed
      
      * lint fixed
      
      * error fixed
      
      * error fixed
      
      * test ci
      
      * test ci
      
      * seperate argsort to be an independent op
      
      * seperate argsort to be an independent op
      
      * fix lint
      
      * fix lint
      
      * remove unsupported models
      
      * typo fixed
      
      * argsort added to realy
      
      * solve conflicts with master
      
      * fix lint
      
      * fix lint
      
      * test push
      
      * Revert "test push"
      
      This reverts commit 6db00883fab6cc06bddf564c926bb27c874397d8.
      
      * fix lint error
      
      * fix more lint
      
      * cpu test_sort udpated
      
      * debug ci
      
      * nms fixed
      
      * expose argsort to relay frontend
      
      * test ci
      
      * fix lint
      
      * sort register error fixed
      
      * fix nnvm
      
      * nms type fixed
      
      * adaptive pooling added to relay
      
      * Revert "adaptive pooling added to relay"
      
      This reverts commit 1119f1f2c055753e0cc5611627597749134c5c8c.
      
      * fix lint
      
      * expose argsort op
      
      * fix lint
      
      * fix lint
      
      * fix lint
      
      * sort test updated
      
      * sort bug fixed
      
      * nnvm error fixed
      
      * fix argsort default data type returned to be float insteaf of int
      
      * fix lint
      
      * fix lint
      
      * test fixed
      
      * fix valid count
      
      * fix titanx bug
      
      * tutorial add both targets
      
      * titanx error fixed
      
      * try to fix CI old gpu error
      
      * try to solve CI GPU error
      
      * get_valid_count added
      
      * reverse get_valid_count
      
      * get valid count optimized
      
      * address comments
      
      * fix ci error
      
      * remove unessesary block sync
      
      * add back one sync
      
      * address comments
      
      * address more comments
      
      * more comments
      
      * move sort to be indepent algorithm
      
      * typo fixed
      
      * more typos
      
      * comments addressed
      
      * doc updated
      
      * fix pylint
      
      * address final comments
      
      * apache license added
      Leyuan Wang committed
  12. 08 Apr, 2019 1 commit
    • [HEADER] Add Header to Comply with ASF Release Policy (#2982) · cffb4fba
      * [HEADER] ASF header dir=include
      
      * [HEADER] ASF Header dir=src
      
      * [HEADER] ASF Header -dir=python
      
      * [HEADER] ASF header dir=topi
      
      * [HEADER] ASF Header dir=nnvm
      
      * [HEADER] ASF Header -dir=tutorials
      
      * [HEADER] ASF Header dir=tests
      
      * [HEADER] ASF Header -dir=docker
      
      * fix whitespace
      
      * [HEADER] ASF Header -dir=jvm
      
      * [HEADER] ASF Header -dir=web
      
      * [HEADER] ASF Header --dir=apps
      
      * [HEADER] ASF Header --dir=vta
      
      * [HEADER] ASF Header -dir=go
      
      * temp
      
      * [HEADER] ASF Header --dir=rust
      
      * [HEADER] Add ASF Header --dir=cmake
      
      * [HEADER] ASF Header --dir=docs
      
      * [HEADER] Header for Jenkinsfile
      
      * [HEADER] ASF Header to toml and md
      
      * [HEADER] ASF Header to gradle
      
      * Finalize rat cleanup
      
      * Fix permission
      
      * Fix java test
      
      * temporary remove nnvm onnx test
      Tianqi Chen committed
  13. 23 Mar, 2019 1 commit
  14. 06 Dec, 2018 1 commit
  15. 15 Nov, 2018 2 commits
  16. 10 Nov, 2018 1 commit
  17. 21 Oct, 2018 1 commit
  18. 04 Oct, 2018 1 commit
  19. 06 Sep, 2018 1 commit
  20. 10 Aug, 2018 1 commit
  21. 09 Jul, 2018 1 commit
  22. 06 Jul, 2018 1 commit
  23. 12 May, 2018 1 commit
  24. 06 Apr, 2018 2 commits
  25. 05 Apr, 2018 2 commits
  26. 07 Mar, 2018 1 commit
  27. 25 Feb, 2018 1 commit
  28. 03 Jan, 2018 1 commit
  29. 02 Jan, 2018 1 commit
  30. 27 Dec, 2017 1 commit
  31. 26 Dec, 2017 1 commit
  32. 24 Dec, 2017 1 commit
    • [ROCM] MIOpen contrib for convolution kernels (#722) · 3b9f1652
      * fist working miopen support
      
      * do FindFwdAlgo during build time
      
      * fix lint
      
      * update doc string
      
      * import topi after checking if rocm is enabled
      
      * add miopen namespace
      
      * fixed descriptor overwrite bug
      
      * add use_miopen option
      
      * fix lint
      
      * better miopen option handling
      
      * fix typo
      
      * fix options handling
      masahi committed
  33. 01 Dec, 2017 1 commit
  34. 21 Nov, 2017 1 commit
  35. 15 Aug, 2017 1 commit