1. 19 Jan, 2020 1 commit
    • [REFACTOR] Establish tir (#4740) · cf59b206
      TIR is the new namespace for low-level IR
      for tensor-level optimizations and loop transformations.
      
      This PR establishes the namespace and files.
      
      - lowered_func.h,buffer.h,data_layout.h -> tir/buffer.h,tir/data_layout.h,tir/lowered_func.h
      - ir.h -> tir/expr.h, tir/stmt.h
      - ir_functor_ext.h -> tir/expr_functor.h, tir/stmt_functor.h
      Tianqi Chen committed
  2. 16 Jan, 2020 1 commit
  3. 09 Jan, 2020 1 commit
    • [REFACTOR][IR] tvm::Expr -> PrimExpr(Primitive Expr) (#4669) · d6a23cf5
      * [REFACTOR][IR] tvm::Expr -> PrimExpr(Primitive Expr)
      
      As part of unified IR, we will need to unify relay::Expr
      and the current tvm::Expr under the same base type.
      
      From the techinical point of view. tvm::Expr is a "primitive"
      expression that only contains POD types and handles and does
      not do life-cycle management.
      
      This PR renames Expr->PrimExpr to clarify that.
      We will send a subsequent PR to introduce the base expr class.
      
      * Remove legacy VarExpr and ExprHash/Equal
      Tianqi Chen committed
  4. 08 Jan, 2020 1 commit
    • [REFACTOR][IR] Add Node suffix to low-level IR nodes (#4649) · f4c5f93b
      * [REFACTOR][IR] Variable -> VarNode
      
      * [REFACTOR][IR] Add/Sub/Mul/Div -> AddNode/SubNode etc.
      
      * [REFACTOR][IR] Min/Max/FloorDiv/FloorMod -> MinNode/MaxNode etc.
      
      * [REFACTOR][IR] EQ/NE/LT/LE/GT/GE/Select -> EQNode/NENode etc.
      
      * [REFACTOR][IR] Add Node suffix to Select/Call/Load/Ramp/Shuffle/Let
      
      * [REFACTOR][IR] Add node suffix to IntImm/UIntImm/FloatImm/StringImm
      
      * [REFACTOR][IR] Add Node suffix to Any, AttrStmt, AssertStmt
      
      * [REFACTOR][IR] Add Node suffix to Store/Provide/Allocate/Free
      
      * [REFACTOR][IR] Add Node suffix to ProducerConsumer
      
      * Fix lint
      
      * style updates, test fixes
      Tianqi Chen committed
  5. 03 Jan, 2020 1 commit
    • [REFACTOR] Migrate Low-level IR Passes into the New Stmt/Expr Mutator (#4607) · 203ca7a0
      * CombineContextCall
      
      * Migrate BoundChecker
      
      * Migrate CoprocSync
      
      * Migrate detect_device
      
      * Migrate loop_partition
      
      * Migrate infer_fragement
      
      * Migrate inject_copy_intrin
      
      * Migrate inject double buffer
      
      * Migrate lower_intrin and simplify
      
      * Migrate storage flatten
      
      * Migrate inject prefetch
      
      * Migrate inject_virtual_thread
      
      * migrate inline
      
      * Migrate lift attr scope
      
      * Migrate custom datatypes
      
      * migrate lower_thread_all_reduce
      
      * Migrate lower_tvm_builtin
      
      * migrate lower_warp memory
      
      * Migrate make_api.cc
      
      * Migrate remap_thread_axis
      
      * Migrate remove_no_op
      
      * migrate rewrite_unsafe_select
      
      * Migrate skip_assert simple_passes
      
      * Migrate split_host_device
      
      * Migrate ssa
      
      * Migrate storage_access
      
      * Migrate storage_rewrite
      
      * Migrate tensor_core
      
      * Migrate unroll_loop
      
      * Migrate vectorize
      
      * Migrate verify compact_buffer gpu_code
      
      * Migrate verify_memory
      
      * Migrate storage_sync
      
      * Remove unused refs to mutator
      
      * Migrate hybrid_op
      
      * Migrate tensorize
      
      * Migrate schedule ops
      
      * Migrate schedule_dataflow_rewrite
      
      * Migrate auto_inline_elemwise
      
      * Remove unecessary ref to visitor
      
      * remove unecessary ref
      
      * Migrate bound_deducer
      
      * Migrate domain_touched
      
      * Migrate autotvm feature touch extractor
      
      * Add annotations
      Tianqi Chen committed
  6. 21 Oct, 2019 1 commit
    • [REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol. (#4161) · 7895adb2
      * [REFACTOR][NODE][RUNTIME] Move Node to the new Object protocol.
      
      This PR removes the original node system, and make node as a subclass of Object.
      This is a major refactor towards a better unified runtime object system.
      
      List of changes in the refactor:
      
      - We now hide data_ field, use Downcast explicitly to get a sub-class object.
      - Removed the node system FFI in python.
      - Removed the node C API, instead use PackedFunc for list and get attrs.
      - Change relay::Op::set_attr_type_key(attr_key_name) to relay::Op::set_attr_type<AttrType>().
        - This change was necessary because of the new Object registration mechanism.
        - Subsequent changes to the op registrations
        - The change revealed a few previous problems that is now fixed.
      - Patched up a few missing node type registration.
        - Now we will raise an error if we register object that is not registered.
      - The original node.h and container.h are kept in the same location.
      - Calling convention: kObjectHandle now equals the old kNodeHandle, kNodeHandle is removed.
      - IRFunctor now dispatches on ObjectRef.
      - Update to the new type checking API: is_type, derived_from are replaced by IsInstance.
      - Removed .hash member function, instead use C++ convention hasher functors.
      
      * Address review comments
      Tianqi Chen committed
  7. 25 Sep, 2019 1 commit
    • Changes to make tensorize work. These changes also fix the previously broken test. (#3981) · b410df8c
      * Changes to make tensorize work. These changes also fix the previously
      broken test.
      
      Summary:
      Tensorize was breaking  for a few reasons.
      1)
      Assert at: src/op/tensorize.cc:234 CHECK(is_one(e.region[j]->extent))
      In some cases this cannot be proven, e.g.:
      expected shape=[16, 4], given region=[range(min=((ax1.outer*16)/16), ext=(((((ax1.outer*16) + 15)/16) + 1) - ax1.outer)), range(min=((k.outer*4)/4), ext=(((((k.outer*4) + 3)/4) + 1) - k.outer)), range(min=0, ext=16), range(min=0, ext=4)]
      The unprovable one is: ext=(((((ax1.outer*16) + 15)/16) + 1) - ax1.outer)).
      This can be simplified but it is not because to simplify divide, it must
      prove ax1.outer > 0 and since it is var it cannot. The fix for this to
      just find all the vars in expr in relace them with some const value.
      
      2) Equivalence between tensorized expr and one being asked to tensorize. For example,
      the error would be.
      TVMError: Check failed: Equal(lhs, rhs):
      Failed to match the compute with TensorIntrin tensor_intrin's declaration
      provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[(int16)0]), source=[(int16(data(k))*int16(kernel(((((((((k.outer.outer*64) + (k.outer.inner*2)) + k)/2)*128) + i) - (k.outer.inner*128)) - (k.outer.outer*4096)), ((((k.outer.outer*64) + (k.outer.inner*2)) + k) % 2))))], axis=[iter_var(k, range(min=0, ext=2))], where=(bool)1, value_index=0),
      intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[(int16)0]), source=[(int16(data(k))*int16(kernel(i, k)))], axis=[iter_var(k, range(min=0, ext=2))], where=(bool)1, value_index=0)
      Difference is mainly in the source part:
      source=[(int16(data(k))*int16(kernel(((((((((k.outer.outer*64) + (k.outer.inner*2)) + k)/2)*128) + i) - (k.outer.inner*128)) - (k.outer.outer*4096)), ((((k.outer.outer*64) + (k.outer.inner*2)) + k) % 2))))]
      source=[(int16(data(k))*int16(kernel(i, k)))], axis=[iter_var(k, range(min=0, ext=2))]
      This was not being simpifiled due to compute_intrin_iter_space (map for
      iter var to range) not containing leaf iter vars.
      
      3) Here it fails with:
      Check failed: is_one(Simplify(value->shape[i])): Argument b_buffer shape mismatch[16, 4] vs [(((((ax1.outer*16) + 15)/16) + 1) - ax1.outer), (((((k.outer*4) + 3)/4) + 1) - k.outer), 16, 4]
      This is in buffer binding where it thinks expected and buffer bound
      shape is different. Although if we could simplify expr, this would not
      be the case.
      
      Test Plan:
      On skylake avx512 machine:
      python tests/python/contrib/test_gemm_acc16.py
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Implemented bounded analyzer which traverses tree and for reduce/for
      statements binds the bound of the analyzer. Later this is used to
      simplify expressions. Inspired from ir_mutator_with_analyzer
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Addressed comments.
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Added ASF header + define macro for the header file: TVM_ARITHMETIC_IR_VISITOR_WITH_ANALYZER_H_
      Some lint fixes as well.
      
      * Relax the assumption that dom_map must always contain all leaf itervars.
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      
      * Disable copy constructor and move to raw ptr.
      
      Summary:
      
      Test Plan:
      
      Reviewers:
      
      Subscribers:
      
      Tasks:
      
      Tags:
      Kimish Patel committed