1. 21 Feb, 2020 1 commit
    • [CODEGEN] Support cuda tensorcore subbyte int data type in auto tensorcore (#4546) · f23ac969
      * support cuda tensorcore subbyte int data type in auto tensorcore
      
      * add lisence
      
      * pass cpplint
      
      * fix code review comments
      
      * merge the int4/int1 codegen tutorial into the existing auto tensorcore tutorial
      
      * using master's new API
      
      * disable tuning when cuda is not enabled
      
      * address cr comment
      
      * do not run the tuning
      
      * fix test failure
      
      * fix cpplint error
      
      * fix bool type reduction bug
      
      * 1. fix a index bug 2. fix returned bytes value of int1/int4/uint4
      
      * fix typo
      Orion34C committed
  2. 19 Jan, 2020 1 commit
    • [REFACTOR] Establish tir (#4740) · cf59b206
      TIR is the new namespace for low-level IR
      for tensor-level optimizations and loop transformations.
      
      This PR establishes the namespace and files.
      
      - lowered_func.h,buffer.h,data_layout.h -> tir/buffer.h,tir/data_layout.h,tir/lowered_func.h
      - ir.h -> tir/expr.h, tir/stmt.h
      - ir_functor_ext.h -> tir/expr_functor.h, tir/stmt_functor.h
      Tianqi Chen committed
  3. 09 Jan, 2020 1 commit
    • [REFACTOR][IR] tvm::Expr -> PrimExpr(Primitive Expr) (#4669) · d6a23cf5
      * [REFACTOR][IR] tvm::Expr -> PrimExpr(Primitive Expr)
      
      As part of unified IR, we will need to unify relay::Expr
      and the current tvm::Expr under the same base type.
      
      From the techinical point of view. tvm::Expr is a "primitive"
      expression that only contains POD types and handles and does
      not do life-cycle management.
      
      This PR renames Expr->PrimExpr to clarify that.
      We will send a subsequent PR to introduce the base expr class.
      
      * Remove legacy VarExpr and ExprHash/Equal
      Tianqi Chen committed
  4. 08 Jan, 2020 1 commit
    • [REFACTOR][IR] Add Node suffix to low-level IR nodes (#4649) · f4c5f93b
      * [REFACTOR][IR] Variable -> VarNode
      
      * [REFACTOR][IR] Add/Sub/Mul/Div -> AddNode/SubNode etc.
      
      * [REFACTOR][IR] Min/Max/FloorDiv/FloorMod -> MinNode/MaxNode etc.
      
      * [REFACTOR][IR] EQ/NE/LT/LE/GT/GE/Select -> EQNode/NENode etc.
      
      * [REFACTOR][IR] Add Node suffix to Select/Call/Load/Ramp/Shuffle/Let
      
      * [REFACTOR][IR] Add node suffix to IntImm/UIntImm/FloatImm/StringImm
      
      * [REFACTOR][IR] Add Node suffix to Any, AttrStmt, AssertStmt
      
      * [REFACTOR][IR] Add Node suffix to Store/Provide/Allocate/Free
      
      * [REFACTOR][IR] Add Node suffix to ProducerConsumer
      
      * Fix lint
      
      * style updates, test fixes
      Tianqi Chen committed
  5. 03 Jan, 2020 1 commit
    • [REFACTOR] Migrate Low-level IR Passes into the New Stmt/Expr Mutator (#4607) · 203ca7a0
      * CombineContextCall
      
      * Migrate BoundChecker
      
      * Migrate CoprocSync
      
      * Migrate detect_device
      
      * Migrate loop_partition
      
      * Migrate infer_fragement
      
      * Migrate inject_copy_intrin
      
      * Migrate inject double buffer
      
      * Migrate lower_intrin and simplify
      
      * Migrate storage flatten
      
      * Migrate inject prefetch
      
      * Migrate inject_virtual_thread
      
      * migrate inline
      
      * Migrate lift attr scope
      
      * Migrate custom datatypes
      
      * migrate lower_thread_all_reduce
      
      * Migrate lower_tvm_builtin
      
      * migrate lower_warp memory
      
      * Migrate make_api.cc
      
      * Migrate remap_thread_axis
      
      * Migrate remove_no_op
      
      * migrate rewrite_unsafe_select
      
      * Migrate skip_assert simple_passes
      
      * Migrate split_host_device
      
      * Migrate ssa
      
      * Migrate storage_access
      
      * Migrate storage_rewrite
      
      * Migrate tensor_core
      
      * Migrate unroll_loop
      
      * Migrate vectorize
      
      * Migrate verify compact_buffer gpu_code
      
      * Migrate verify_memory
      
      * Migrate storage_sync
      
      * Remove unused refs to mutator
      
      * Migrate hybrid_op
      
      * Migrate tensorize
      
      * Migrate schedule ops
      
      * Migrate schedule_dataflow_rewrite
      
      * Migrate auto_inline_elemwise
      
      * Remove unecessary ref to visitor
      
      * remove unecessary ref
      
      * Migrate bound_deducer
      
      * Migrate domain_touched
      
      * Migrate autotvm feature touch extractor
      
      * Add annotations
      Tianqi Chen committed
  6. 31 Dec, 2019 1 commit
    • [REFACTOR][OBJECT] Consoldiate NodePtr/Ref/Hash/Equal to Object (#4603) · a8c36921
      * [REFACTOR][OBJECT] Consoldiate NodePtr/Ref/Hash/Equal and macros to Object.
      
      Historically, we have classes like NodePtr/Ref/HashEqual.
      After unified object protocol, these names are just alias of the object counterpart.
      Moreover, there are helper macros defined over the places for defining these object.
      
      This PR consoldiate the terminologies into the corresponding ones
      in the Object system so we have a clean and consistent API moving forward.
      
      * Update include/tvm/attrs.h
      
      Co-Authored-By: Wei Chen <ipondering.weic@gmail.com>
      
      * fix compilation
      
      Co-authored-by: Wei Chen <ipondering.weic@gmail.com>
      Tianqi Chen committed
  7. 24 Nov, 2019 1 commit
  8. 24 Oct, 2019 1 commit
    • TensorCore Support using Intrinsic (#4136) · 324a9607
      * add tensor core support
      
      * avoid memory bank conflict
      
      * fix thread sync & better performance
      
      * better performance
      
      * add schedule test for conv2d
      
      * extend into BatchMatMul
      
      * support config fragment shape and layout using intrinsic
      
      * add TensorCore tutorial
      
      * add int support and fix lint
      
      * address comment
      
      * add 32*16*8 TensorCore test
      
      * fix wmma include logic
      Siyuan Feng committed