1. 14 Nov, 2019 2 commits
  2. 13 Nov, 2019 2 commits
  3. 12 Nov, 2019 7 commits
  4. 11 Nov, 2019 7 commits
  5. 10 Nov, 2019 5 commits
  6. 09 Nov, 2019 1 commit
    • Auto TensorCore CodeGen (#4234) · d64bf6b5
      * Add Auto TensorCore TensorCore Unit Test
      
      * Rebase to tvm master branch & Add auto tensor core
      
      * Code Refine
      
      * Add tensor core switch by pragma
      
      * Add pragma in tensor core example code
      
      * Get real tile size to replace hard coded 16
      
      * support more than 2 dimensions (e.g. batchmatmul) for buffer bind scope
      
      * support batch matmul
      
      * Move cuda env check to tensor_core.cc
      
      * Coderefine for tensor_core.cc
      
      * Refine comments
      
      * Some refinements of code and comment
      
      * Update TensorCore UT to pass the CPU test
      
      * remove redundant code
      
      * matmul's storage align for different layout
      
      * Add support for differenct position of type cast
      
      * Add formal tutorial for auto tensorcore codegen
      
      * move tensorcore check up to tutorial code
      
      * code and doc refine
      
      * comment out tune_and_evaluate in tutorial
      
      * fix cpplint error
      Minmin Sun (孙敏敏) committed
  7. 08 Nov, 2019 2 commits
  8. 07 Nov, 2019 2 commits
  9. 06 Nov, 2019 4 commits
  10. 05 Nov, 2019 2 commits
  11. 04 Nov, 2019 4 commits
  12. 02 Nov, 2019 2 commits
    • [VTA] Performance optimize, remove unnecessary contigious memory use. (#4246) · 008aa838
      * [VTA] Performance optimize, remove unnecessary contigious memory use.
      
      Issue:
      Uop maintain a cache vector to copy uop data into contigious DRAM memory for
      FPGA/Simulator use, but this cache vector not get clear after FPGA/Simulator
      core run, in Resnet18 case, if we printf the cache size in UopQueue::ReadBarrier
      function, we can saw such cache size keep increase, this would cause
      no use data copy and unnecessary contigous DRAM memory malloc.
      
      Analysis:
      This issue caused by not clear cache_ vector when do
      uop_queue_.Reset().
      
      Solution:
      Override BaseQueue Reset function in UopQueue and add cache_ clear
      logic.
      
      * address review comments, remove spacing.
      Hua Jiang committed
    • Support reshape for dynamic shape in tf converter (#4185) · e9039d04
      * Support reshape for dynamic shape in tf converter
      
      * Only allow reshape directly after shape function for symbolic input shape
      
      * Fix lint
      Yao Wang committed