[FRONTEND] A Python hybrid frontend (#1251)

90db723d · Jian Weng · Tianqi Chen · a55bc290 · 90db723d · 90db723d
Commit 90db723d authored Jun 22, 2018 by Jian Weng Committed by Tianqi Chen Jun 22, 2018
14 changed files
--- a/docs/api/python/hybrid.rst
+++ b/docs/api/python/hybrid.rst
+tvm.hybrid
+----------
+.. automodule:: tvm.hybrid
+
+.. autosummary::
+
+   tvm.hybrid.parse
+   tvm.hybrid.script
+   tvm.hybrid.popcount
+   tvm.hybrid.sigmoid
+
+.. autofunction:: tvm.hybrid.parse
+.. autofunction:: tvm.hybrid.script
+.. autofunction:: tvm.hybrid.popcount
+.. autofunction:: tvm.hybrid.sigmoid
--- a/docs/api/python/index.rst
+++ b/docs/api/python/index.rst
@@ -21,3 +21,4 @@ Python API
   dev
   topi
   nnvm/index
+   hybrid
--- a/docs/dev/hybrid_script.rst
+++ b/docs/dev/hybrid_script.rst
+Hybrid Frontend Developer Guide
+===============================
+
+If you are a developer:
+
+1. who is trying writing some preliminary patterns that have not been supported by TVM yet,
+maybe :ref:`hybrid-langref-label` is a better place for you.
+
+2. who wants to know the implementing details of this module, you are right here!
+
+Features
+--------
+
+Software emulation
+~~~~~~~~~~~~~~~~~~
+
+In software emulation, the most intresting thing is the decorator ``tvm.hybrid.script``.
+This decorator helps 2 things:
+
+1. Importing runtime variables
+
+2. Overload the function according to the arguments passed
+
+Correct me if I am wrong: I believe that how 1. is implemented is dangerous, but I have no
+choice. What I did is add those names into python dict ``func.__global__`` and after
+the call to ``func`` is done, those names will be cleaned up. 
+
+Overload is simple: the decorator checks the arguments' types and determines which function
+should be actually called.
+
+
+Backend Compilation
+~~~~~~~~~~~~~~~~~~~
+
+Compilation is a large module, you can see ``python/tvm/hybrid/var_decl.py`` and
+``python/tvm/hybrid/parser.py`` for more details. The first stage determines the
+usage, or more accurately the declaration of each variable and the second stage does
+the actual IR generation.
+
+Attributes
+~~~~~~~~~~
+
+So far, ONLY tensors' `shape` attribute is supported. You can see ``visit_Subscript``
+in ``python/tvm/hybrid/parser.py`` for more details. This is a hacky solution, I just
+check the attributes when subscript.
+
+Loops
+~~~~~
+
+In HalideIR, loops have in total 4 types: ``serial``, ``unrolled``, ``parallel``, and ``vectorized``.
+
+
+.. note::
+
+    Unlike what that is in HalideIR, in ``loop_type(a, b)``, ``a`` is the starting point and ``b``
+    is the trip count of iterations. Here ``loop_type(a, b)`` indicates ``[a, b)``. Thus, when lowering it
+    to HalideIR, we need to do ``start, extent = a, b - a``
+
+
+.. note::
+
+    In HalideIR those are enums, they are in passive form.
+    Here we use active form to annotate loops, because they are ready to run.
+
+
+Variables
+~~~~~~~~~
+
+Because there is no variables in ``HalideIR``, all the mutatable variables will be lowered to an array with size 1.
+It takes the first store of a variable as its declaration.
+
+Math intrinsics
+~~~~~~~~~~~~~~~
+So far, these math intrinsics, ``log``, ``exp``, ``sigmoid``, ``tanh``, ``power``, and ``popcount``, are supported.
+Math intrinsics will be imported by the decorator. Most of the intrinsics are borrowed by library implementation
+except ``popcount`` and ``sigmoid``. I implemented them manually.
--- a/docs/dev/index.rst
+++ b/docs/dev/index.rst
@@ -10,3 +10,4 @@ In this part of documentation, we share the rationale for the specific choices m
   runtime
   nnvm_json_spec
   nnvm_overview
+   hybrid_script
--- a/docs/langref/hybrid_script.rst
+++ b/docs/langref/hybrid_script.rst
+.. _hybrid-langref-label:
+
+Hybrid Frontend Language Reference
+==================================
+
+Overview
+--------
+
+This hybrid frontend allows users to write preliminary versions of some idioms that yet have
+been supported by TVM officially.
+
+Features
+--------
+
+Software Emulation
+~~~~~~~~~~~~~~~~~~
+
+Both software emulation and compilation are supported. To define a function,
+you need to use ``tvm.hybrid.script`` decorator to indicate this is a hybrid function:
+
+.. code-block:: python
+
+    @tvm.hybrid.script
+    def outer_product(a, b, c):
+        for i in range(a.shape[0]):
+            for j in range(b.shape[0]):
+                c[i, j] = a[i] * b[j]
+    a = numpy.random.rand(100)
+    b = numpy.random.rand(99)
+    c = numpy.zeros((100, 99))
+    outer_product(a, b, c)
+
+This decorator will import `Keywords`_ required spontaneously when software emulation.
+After software emulation is done, the imported keywords will be cleaned up. Users do not need
+worry about keyword conflict and pollution.
+
+Every element passed for software emulation in the argument list is either a python variable
+or ``numpy`` numeric type.
+
+Backend Compilation
+~~~~~~~~~~~~~~~~~~~
+
+The current parse interface looks like:
+
+.. code-block:: python
+
+   a = tvm.placeholder((100, ), name='a')
+   b = tvm.placeholder((99, ), name='b')
+   c = tvm.placeholder((100, 99), name='c')
+   tvm.hybrid.parse(outer_product, [a, b, c]) # return an ir root of this function
+
+If we pass these tvm tensors to this function, it returns a op node:
+
+**Under construction, we are still deciding what kind of node should be returned.**
+
+.. code-block:: python
+
+   a = tvm.placeholder((100, ), name='a')
+   b = tvm.placeholder((99, ), name='b')
+   c = tvm.placeholder((100, 99), name='c')
+   op = outer_product(a, b, c) # return the corresponding op node
+
+Tuning
+~~~~~~
+
+**Under construction, not truly supported yet.**
+
+Follow up the example above, you can use some tvm like interfaces to tune the code: 
+
+.. code-block:: python
+
+   sch = tvm.create_schedule(op)
+   jo, ji = sch.split(j, 4)
+   sch.vectorize(ji)
+
+``split``, ``reorder``, and loop_annotation will be supported!
+
+Loops
+~~~~~
+
+In HalideIR, loops have in total 4 types: ``serial``, ``unrolled``, ``parallel``, and ``vectorized``.
+
+Here we use ``range`` aka ``serial``, ``unroll``, ``parallel``, and ``vectorize``,
+these **4** keywords to annotate the corresponding types of for loops.
+The the usage is roughly the same as Python standard ``range``.
+
+Variables
+~~~~~~~~~
+
+All the mutatable variables will be lowered to an array with size 1.
+It regards the first store of a variable as its declaration.
+
+.. note::
+
+        Unlike conventional Python, in hybrid script, the declared variable
+        can only be used in the scope level it is declared.
+
+
+.. note::
+
+        Currently, you can ONLY use basic-typed variables, i.e. the type of the
+        variable should be either ``float32``, or ``int32``.
+
+.. code-block:: python
+
+   for i in range(5):
+       s = 0 # declaration, this s will be a 1-array in lowered IR
+       for j in range(5):
+     	  s += a[i, j] # do something with sum
+       b[i] = sum # you can still use sum in this level
+   a[0] = s # you CANNOT use s here, even though it is allowed in conventional Python
+   b = (1, 2) # this has NOT been supported yet!
+
+
+Attributes
+~~~~~~~~~~
+
+So far, ONLY tensors' ``shape`` attribute is supported! The ``shape`` atrribute is essentailly a
+tuple, so you MUST access it as an array. Also, currently, only constant-indexed access is supported.
+
+.. code-block:: python
+
+   x = a.shape[2] # OK!
+   for i in range(3):
+      for j in a.shape[i]: # BAD! i is not a constant!
+          # do something
+
+
+Conditional Statement and Expression
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+   if condition:
+        # do something
+   a = b if condition else c
+
+However, NO ``True`` and ``False`` keyword supported yet.
+
+
+Math Intrinsics
+~~~~~~~~~~~~~~~
+
+So far, these math intrinsics, ``log``, ``exp``, ``sigmoid``,
+``tanh``, ``power``, and ``popcount``, are supported.
+No import is required, just as it is mentioned in `Software Emulation`_, just use it!
+
+Array Allocation
+~~~~~~~~~~~~~~~~
+
+**Under construction, this function will be supported later!**
+
+Use a function call ``allocation(shape, type, share/local)`` to declare an array buffer.
+The basic usage is roughly the same as a normal array.
+
+
+Thread Bind
+~~~~~~~~~~~
+
+
+You can also do loop-thread bind by writing code like this:
+
+.. code-block:: python
+
+   for tx in bind("threadIdx.x", 100):
+       a[tx] = b[tx]
+
+
+Keywords
+~~~~~~~~
+- For keywords: ``serial``, ``range``, ``unroll``, ``parallel``, ``vectorize``, ``bind``
+- Math keywords: ``log``, ``exp``, ``sigmoid``, ``tanh``, ``power``, ``popcount``
--- a/docs/langref/index.rst
+++ b/docs/langref/index.rst
@@ -2,3 +2,8 @@ Language Reference
 ==================
 This document provide references to
 embedded languages in TVM stack.
+
+.. toctree::
+   :maxdepth: 2
+
+   hybrid_script
--- a/python/tvm/build_module.py
+++ b/python/tvm/build_module.py
@@ -332,12 +332,20 @@ def lower(sch,
    lower_phase1 = [x[1] for x in add_lower_pass if x[0] == 1]
    lower_phase2 = [x[1] for x in add_lower_pass if x[0] == 2]
    lower_phase3 = [x[1] for x in add_lower_pass if x[0] > 2]
-    # normalize schedule first
-    sch = sch.normalize()
+
    # Phase 0
-    bounds = schedule.InferBound(sch)
-    stmt = schedule.ScheduleOps(sch, bounds)
-    stmt = ir_pass.InjectPrefetch(stmt)
+    if isinstance(sch, schedule.Schedule):
+        # normalize schedule first
+        sch = sch.normalize()
+        bounds = schedule.InferBound(sch)
+        stmt = schedule.ScheduleOps(sch, bounds)
+        stmt = ir_pass.InjectPrefetch(stmt)
+    else:
+        #So far there is no op for hybrid script, so a plain ir body is given
+        if not isinstance(sch, _stmt.Stmt):
+            raise ValueError("sch should be either a Schedule or a Stmt")
+        stmt = sch
+
    for f in lower_phase0:
        stmt = f(stmt)
    # Phase 1

--- a/python/tvm/hybrid/__init__.py
+++ b/python/tvm/hybrid/__init__.py
+"""Hybrid Programming APIs of TVM Python Package.
+
+This package maps a subset of python to HalideIR so that:
+1. Users can write some preliminary versions of the computation patterns
+have not been supported yet and verify it across the real execution and
+python semantic emulation.
+2. Developers can build HalideIR by writing Python code.
+"""
+
+from .api import script, parse
--- a/python/tvm/hybrid/api.py
+++ b/python/tvm/hybrid/api.py
+"""APIs of lowering the Python subset to HalideIR"""
+from __future__ import absolute_import as _abs
+
+import types
+import decorator
+from .parser import parse_python
+
+@decorator.decorator
+def script(func, *args):
+    """If the arguments are tvm types, compile it to HalideIR.
+    O.W. return the python emulated result"""
+    from .util import _enter_hybrid_runtime, _restore_runtime, _is_tvm_arg_types
+    if _is_tvm_arg_types(args):
+        return parse(func, args)
+    else:
+        intersect = _enter_hybrid_runtime(func)
+        func(*args)
+        _restore_runtime(func, intersect)
+    return func
+
+
+def parse(func, args):
+    """Parse a subset of Python to HalideIR
+
+    Parameters
+    ----------
+    func : str or types.FunctionType
+        If it is a string, parse the source code
+        If it is a function, parse the function
+
+    args : list of Buffer or Tensor or Var
+        The argument lists to the function.
+        Leave it None if no buffer is related to the function to be parsed
+
+    Returns
+    -------
+    root : Stmt
+        The result Halide IR and the parser class instance.
+    """
+    from .util import _pruned_source
+    if isinstance(func, str):
+        src = func
+    else:
+        assert isinstance(func, types.FunctionType)
+        src = _pruned_source(func)
+    return parse_python(src, args)
--- a/python/tvm/hybrid/intrin.py
+++ b/python/tvm/hybrid/intrin.py
+"""Intrinsics of TVM-Python Hybrid Script for Python runtime"""
+
+import numpy
+from ..stmt import For
+
+class _range(object):
+    """Base class of the loop ranges in hybrid script"""
+    def __init__(self, a, b=None):
+        if b is None:
+            self.low = 0
+            self.ext = a
+        else:
+            self.low = a
+            self.ext = b
+
+    def __iter__(self):
+        i = 0
+        while i < self.ext:
+            yield i + self.low
+            i += 1
+
+
+class bind(_range): #pylint: disable=invalid-name
+    def __init__(self, tag, ext):
+        super(bind, self).__init__(ext)
+        self.tag = tag
+
+
+unroll = vectorize = parallel = _range #pylint: disable=invalid-name
+
+
+def allocate(shape, dtype='float32'):
+    """Allocate a buffer with given shape
+
+    Parameters
+    ----------
+    shape: Tuple
+        The shape of the tensor to be allocated
+    dtype: string
+        The data type of the tensor
+
+    Returns
+    -------
+    tensor: numpy.array
+        The tensor allocated
+    """
+    return numpy.zeros(shape).astype(dtype)
+
+
+def popcount(x):
+    """
+    Count ones in the binary representation of number x
+
+    Parameters
+    ----------
+    x: Integer
+        The number to be counted
+
+    Returns
+    -------
+    cnt: Integer
+        The number of ones in the binary representation of number x
+    """
+    cnt = 0
+    while x:
+        x -= x & -x
+        cnt += 1
+    return cnt
+
+
+def sigmoid(x):
+    """
+    Sigmoid function of x, aka 1/(1+exp(-x)).
+
+    Parameters
+    ----------
+    x: a real number
+
+    Returns
+    -------
+    res: a real number
+        The result of sigmoid function
+    """
+    return 1 / (1 + numpy.exp(-x))
+
+
+HYBRID_GLOBALS = {
+    'unroll'    : unroll,
+    'vectorize' : vectorize,
+    'parallel'  : parallel,
+    'allocate'  : allocate,
+    'bind'      : bind,
+    'sqrt'      : numpy.sqrt,
+    'log'       : numpy.log,
+    'tanh'      : numpy.tanh,
+    'power'     : numpy.power,
+    'exp'       : numpy.exp,
+    'sigmoid'   : sigmoid,
+    'popcount'  : popcount
+}
+
+
+LOOP_INTRIN = {
+    'range'    : For.Serial,
+    'unroll'   : For.Unrolled,
+    'parallel' : For.Parallel,
+    'vectorize': For.Vectorized,
+    'bind'     : None
+}
+
+
+MATH_INTRIN = ['sqrt', 'log', 'exp', 'tanh', 'sigmoid', 'power', 'popcount']
--- a/python/tvm/hybrid/parser.py
+++ b/python/tvm/hybrid/parser.py
--- a/python/tvm/hybrid/util.py
+++ b/python/tvm/hybrid/util.py
+"""Internal utilities for parsing Python subset to HalideIR"""
+
+import inspect
+import numpy
+from .intrin import HYBRID_GLOBALS
+from .._ffi.base import numeric_types
+from .. import api as _api
+from .. import make as _make
+from .. import expr as _expr
+from ..tensor import Tensor
+
+
+#pylint: disable=invalid-name
+np_arg_types = tuple(list(numeric_types) + [numpy.ndarray])
+tvm_arg_types = (Tensor, _expr.Var)
+halide_imm_types = (_expr.IntImm, _expr.FloatImm, _expr.UIntImm)
+
+
+# Useful constants. In avoid of runtime dependences, we use function calls to return them.
+def make_nop():
+    """Returns a 'no operation' node in HalideIR."""
+    return _make.Evaluate(_api.const(0, dtype='int32'))
+
+
+def make_range_one():
+    """Returns a [0, 1] range node in HalideIR."""
+    return _make.range_by_min_extent(0, 1)
+
+
+def make_const_true():
+    """Returns a constant True node in HalideIR."""
+    return _api.convert(True)
+
+
+def _pruned_source(func):
+    """Prune source code's extra leading spaces"""
+    lines = inspect.getsource(func).split('\n')
+    leading_space = len(lines[0]) - len(lines[0].lstrip(' '))
+    lines = [line[leading_space:] for line in lines]
+    return '\n'.join(lines)
+
+
+def _is_tvm_arg_types(args):
+    """Determine a list of element is either a list of tvm arguments of a list of numpy arguments.
+    If neither is true, raise a value error."""
+    if isinstance(args[0], tvm_arg_types):
+        for elem in args[1:]:
+            if not isinstance(elem, tvm_arg_types):
+                raise ValueError("Expect a Var or Tensor instance but % get!" % str(type(elem)))
+        return True
+    if not isinstance(args[0], np_arg_types):
+        raise ValueError("Expect a numpy type but % get!" % str(type(args[0])))
+    for elem in args[1:]:
+        if not isinstance(elem, np_arg_types):
+            raise ValueError("Expect a numpy type but % get!" % str(type(elem)))
+    return False
+
+
+def _enter_hybrid_runtime(func):
+    """Put hybrid runtime variables into the global scope"""
+    _globals = func.__globals__
+    intersect = []
+    for elem in list(HYBRID_GLOBALS.keys()):
+        if elem in _globals.keys():
+            intersect.append((elem, _globals[elem]))
+        _globals[elem] = HYBRID_GLOBALS[elem]
+    return intersect
+
+
+def _restore_runtime(func, intersect):
+    """Rollback the modification caused by hybrid runtime"""
+    _globals = func.__globals__
+    for elem in list(HYBRID_GLOBALS.keys()):
+        _globals.pop(elem)
+    for k, v in intersect:
+        _globals[k] = v
--- a/python/tvm/hybrid/var_decl.py
+++ b/python/tvm/hybrid/var_decl.py
+"""Determines the declaration, r/w status, and last use of each variable"""
+
+import ast
+import sys
+from .intrin import HYBRID_GLOBALS
+
+
+class PyVariableUsage(ast.NodeVisitor):
+    """The vistor class to determine the declaration, r/w status, and last use of each variable"""
+    #pylint: disable=invalid-name
+    #pylint: disable=missing-docstring
+    def __init__(self, args):
+        self.status = {}
+        self.scope_level = []
+        self._args = {}
+        self.args = args
+
+
+    def visit_FunctionDef(self, node):
+        self.scope_level.append(node)
+        if len(node.args.args) != len(self.args):
+            raise ValueError('#arguments passed should be the same as #arguments defined')
+        for idx, arg in enumerate(node.args.args):
+            _attr = 'id' if sys.version_info[0] < 3 else 'arg' # To make py2 and 3 compatible
+            self._args[getattr(arg, _attr)] = self.args[idx]
+        for i in node.body:
+            self.visit(i)
+
+
+    def visit_For(self, node):
+        if not isinstance(node.target, ast.Name):
+            raise ValueError("For's iterator should be an id")
+        self.visit(node.iter)
+        self.scope_level.append(node)
+        for i in node.body:
+            self.visit(i)
+        self.scope_level.pop()
+
+
+    def visit_Call(self, node):
+        #No function pointer supported so far
+        if not isinstance(node.func, ast.Name):
+            raise ValueError("Function call should be an id")
+        if (node.func.id not in HYBRID_GLOBALS.keys()) and node.func.id != 'range':
+            raise ValueError("Function call id not in intrinsics' list")
+        for elem in node.args:
+            self.visit(elem)
+
+
+    def visit_Name(self, node):
+        # If it is from the argument list or loop variable, we do not worry about it!
+        if node.id in self._args.keys():
+            return
+        fors = [loop.target.id for loop in self.scope_level if isinstance(loop, ast.For)]
+        if node.id in fors:
+            return
+        # The loop variable cannot be overwritten when iteration
+        if isinstance(node.ctx, ast.Store) and node.id in fors:
+            raise ValueError("Iter var cannot be overwritten")
+
+        if node.id not in self.status.keys():
+            if not isinstance(node.ctx, ast.Store):
+                raise ValueError('In Python, "first store" indicates "declaration"')
+            self.status[node.id] = (node, self.scope_level[-1], set())
+        else:
+            decl, loop, usage = self.status[node.id]
+            loop = self.scope_level[-1]
+            usage.add(type(node.ctx))
+            self.status[node.id] = (decl, loop, usage)
+
+
+def determine_variable_usage(root, args):
+    """The helper function for calling the dedicated visitor."""
+    visitor = PyVariableUsage(args)
+    visitor.visit(root)
+    return visitor.status
--- a/tests/python/unittest/test_hybrid_script.py
+++ b/tests/python/unittest/test_hybrid_script.py
+import tvm, inspect, sys, traceback, numpy
+from tvm.hybrid import script
+from tvm.hybrid.intrin import HYBRID_GLOBALS
+
+@script
+def outer_product(n, m, a, b, c):
+    for i in range(n):
+        for j in range(m):
+            c[i, j] = a[i] * b[j]
+
+#Test global function
+#Test bridge between frontend and backend
+def test_outer_product():
+    n = tvm.var('n')
+    m = tvm.var('m')
+    a = tvm.placeholder((n, ), name='a')
+    b = tvm.placeholder((m, ), name='b')
+    c = tvm.placeholder((n, m), name='c')
+    ir = outer_product(n, m, a, b, c)
+    #Check for i in (0, n)
+    assert isinstance(ir, tvm.stmt.For)
+    assert ir.loop_var.name == 'i'
+    assert ir.min.value == 0
+    assert ir.extent.name == 'n'
+    ibody = ir.body
+    assert isinstance(ibody, tvm.stmt.For)
+    #Check for j in (0, m)
+    assert ibody.loop_var.name == 'j'
+    assert ibody.min.value == 0
+    assert ibody.extent.name == 'm'
+    #Check loop body
+    jbody = ibody.body
+    assert isinstance(jbody, tvm.stmt.Provide)
+    assert jbody.func.name == 'c'
+    assert len(jbody.args) == 2
+    assert jbody.args[0].name == 'i'
+    assert jbody.args[1].name == 'j'
+    assert isinstance(jbody.value, tvm.expr.Mul)
+    mul = jbody.value
+    assert isinstance(mul.a, tvm.expr.Call)
+    assert mul.a.name == 'a'
+    assert mul.b.name == 'b'
+
+    func = tvm.lower(ir, [n, m, a, b, c])
+    func = tvm.build(func)
+
+    _n = 999
+    _m = 1001
+    _a = numpy.random.rand(_n).astype('float32')
+    _b = numpy.random.rand(_m).astype('float32')
+    c_python = numpy.zeros((_n, _m), dtype='float32')
+    outer_product(_n, _m, _a, _b, c_python)
+
+    tvm_a = tvm.ndarray.array(_a)
+    tvm_b = tvm.ndarray.array(_b)
+    tvm_c = tvm.ndarray.array(numpy.zeros((_n, _m), dtype='float32'))
+    func(_n, _m, tvm_a, tvm_b, tvm_c)
+    numpy.testing.assert_allclose(tvm_c.asnumpy(), c_python, rtol=1e-5)
+    for key, _ in HYBRID_GLOBALS.items():
+        assert key not in globals().keys()
+        assert key not in outer_product.__globals__.keys()
+
+#Test local function
+#Test allocation of local variable
+def test_fanout():
+    @script
+    def fanout(n, a, b):
+        three = 3.0
+        for i in range(a.shape[0] - 3):
+            sigma = 0.0
+            for j in range(3):
+                sigma = sigma + a[i + j]
+            sigma = sigma / three
+            b[i] = sigma
+
+    n = tvm.var('n')
+    a = tvm.placeholder((n, ), name='a')
+    b = tvm.placeholder((n-3, ), name='b')
+    ir = fanout(n, a, b)
+
+    #Check for i in (0, n-3)
+    assert isinstance(ir, tvm.stmt.For)
+    assert ir.loop_var.name == 'i'
+    assert ir.min.value == 0
+    assert tvm.ir_pass.Equal(ir.extent, n - 3)
+    #Check loopbody
+    ibody = ir.body
+    assert isinstance(ibody, tvm.stmt.Realize)
+    assert ibody.bounds[0].min.value == 0
+    assert ibody.bounds[0].extent.value == 1
+    assert ibody.func.name == 'sigma'
+    #Check i loop body
+    rbody = ibody.body
+    assert isinstance(rbody.first, tvm.stmt.Provide)
+    assert rbody.first.func.name == 'sigma'
+    assert len(rbody.first.args) == 1
+    assert rbody.first.args[0].value == 0
+    #Check fanout loop
+    jloop = rbody.rest.first
+    assert jloop.loop_var.name == 'j'
+    assert jloop.min.value == 0
+    assert jloop.extent.value == 3
+    jbody = jloop.body
+    assert isinstance(jbody, tvm.stmt.Provide)
+    assert len(jbody.args) == 1
+    assert jbody.args[0].value == 0
+    assert jbody.func.name == 'sigma'
+    assert isinstance(jbody.value, tvm.expr.Add)
+    value = jbody.value
+    assert isinstance(value.a, tvm.expr.Call)
+    assert value.a.name == 'sigma'
+    assert len(value.a.args) == 1
+    assert value.a.args[0].value == 0
+    assert value.b.name == 'a'
+    assert len(value.b.args) == 1
+    assert tvm.ir_pass.Equal(value.b.args[0], ir.loop_var + jloop.loop_var)
+    divide= rbody.rest.rest.first
+    assert isinstance(divide, tvm.stmt.Provide)
+    assert len(divide.args) == 1
+    assert divide.args[0].value == 0
+    value = divide.value
+    assert isinstance(value, tvm.expr.Mul)
+    assert value.a.name == 'sigma'
+    assert len(value.a.args) == 1
+    assert value.a.args[0].value == 0
+    assert abs(value.b.value - (1 / 3.0)) < 1e-5
+    write = rbody.rest.rest.rest
+    assert isinstance(write, tvm.stmt.Provide)
+    assert write.func.name == 'b'
+    assert write.value.name == 'sigma'
+    assert len(write.value.args) == 1
+    assert write.value.args[0].value == 0
+
+@script
+def failure():
+    for i in range(1, 100):
+        i = 0
+
+def test_failure():
+    try:
+        tvm.hybrid.parse(failure, [])
+    except IOError as err:
+        assert sys.version_info[0] == 2
+        print('[Warning] Python2 cannot do the failure case because "%s"' % str(err))
+    except Exception as err:
+        assert str(err) == 'You CAN NEVER overwrite a loop variable!'
+
+
+def test_looptype():
+    @script
+    def looptype(a):
+        for i in parallel(6):
+            a[i] = i
+        for j in vectorize(6):
+            a[j] = j
+        for k in unroll(6):
+            a[k] = k
+    a = tvm.placeholder((6, ), name='a')
+    ir = looptype(a)
+    iloop = ir.first
+    jloop = ir.rest.first
+    kloop = ir.rest.rest
+    assert iloop.for_type == tvm.stmt.For.Parallel
+    assert jloop.for_type == tvm.stmt.For.Vectorized
+    assert kloop.for_type == tvm.stmt.For.Unrolled
+
+def test_if():
+    @script
+    def if_then_else(a, b):
+        for i in range(10):
+            if i % 2 == 0:
+                a[i] = -1
+            else:
+                a[i] = 1
+        for i in unroll(10):
+            b[i] = -1 if i % 2 == 0 else 1
+
+    a = tvm.placeholder((10, ), dtype='int32', name='a')
+    b = tvm.placeholder((10, ), dtype='int32', name='b')
+    ir = if_then_else(a, b)
+    func = tvm.lower(ir, [a, b])
+    func = tvm.build(func)
+    assert func
+
+    _a = numpy.zeros((10, ), dtype = 'int32')
+    _b = numpy.zeros((10, ), dtype = 'int32')
+    if_then_else(_a, _b)
+
+    tvm_a = tvm.ndarray.array(numpy.zeros((10, ), dtype='int32'))
+    tvm_b = tvm.ndarray.array(numpy.zeros((10, ), dtype='int32'))
+    func(tvm_a, tvm_b)
+
+    numpy.testing.assert_allclose(tvm_a.asnumpy(), _a, rtol=1e-5)
+    numpy.testing.assert_allclose(tvm_b.asnumpy(), _b, rtol=1e-5)
+    numpy.testing.assert_allclose(tvm_a.asnumpy(), tvm_b.asnumpy(), rtol=1e-5)
+
+def test_bind():
+    if not tvm.gpu(0).exist:
+        print('No GPU found! Skip this test!')
+        return
+    @script
+    def vec_add(a, b, c):
+        for tx in bind('threadIdx.x', 1000):
+            c[tx] = b[tx] + c[tx]
+
+    a = tvm.placeholder((1000, ), dtype='float32', name='a')
+    b = tvm.placeholder((1000, ), dtype='float32', name='b')
+    c = tvm.placeholder((1000, ), dtype='float32', name='c')
+    ir = vec_add(a, b, c)
+
+    func = tvm.lower(ir, [a, b, c])
+    func = tvm.build(func, target = 'cuda')
+
+    _a = numpy.random.rand(1000).astype('float32')
+    _b = numpy.random.rand(1000).astype('float32')
+    _c = numpy.zeros((1000, ), dtype = 'float32')
+
+
+    tvm_a = tvm.ndarray.array(_a, tvm.gpu(0))
+    tvm_b = tvm.ndarray.array(_b, tvm.gpu(0))
+    tvm_c = tvm.ndarray.array(_c, tvm.gpu(0))
+
+    func(tvm_a, tvm_b, tvm_c)
+    vec_add(_a, _b, _c)
+
+    numpy.testing.assert_allclose(_c, tvm_c.asnumpy(), rtol=1e-5)
+
+def test_math_intrin():
+    @script
+    def intrin_real(a):
+        a[0] = sqrt(a[0])
+        a[1] = log(a[1])
+        a[2] = exp(a[2])
+        a[3] = sigmoid(a[3])
+        a[4] = power(a[4], a[5])
+        a[5] = tanh(a[5])
+
+    a6 = tvm.placeholder((6, ), dtype='float32', name='a')
+    ir = intrin_real(a6)
+    func = tvm.build(tvm.lower(ir, [a6]))
+    assert func
+    a = numpy.arange(2, 8).astype('float32')
+    tvm_a = tvm.ndarray.array(a)
+    func(tvm_a)
+    intrin_real(a)
+    numpy.testing.assert_allclose(a, tvm_a.asnumpy(), rtol=1e-5)
+
+    @script
+    def intrin_int(a):
+        a[0] = popcount(a[0])
+
+    a1 = tvm.placeholder((1, ), dtype='int32')
+    ir = intrin_int(a1)
+    func = tvm.build(tvm.lower(ir, [a1]))
+    assert func
+    a = numpy.array([1234567890]).astype('int32')
+    tvm_a = tvm.ndarray.array(a)
+    intrin_int(a)
+    func(tvm_a)
+    assert tvm_a.asnumpy()[0] == a[0]
+
+def test_allocate_buffer():
+    def blur(a):
+        for i in serail(32):
+            h_blur = allocate((4, 36))
+            for j in serail(4):
+                for k in serail(36):
+                    s = allocate((1, ), 'float32')
+                    for dj in serail(4):
+                        s[0] = s[0] + a[i, j + dj]
+                    h_blur[j, k] = s[0] / 4.
+            for j in serail(32):
+                s = 0.
+                for di in serail(4):
+                    s = s + h_blur[di, j]
+                h_blur[i, j] = s / 4.
+                                
+
+if __name__ == "__main__":
+    test_outer_product()
+    test_fanout()
+    test_failure()
+    test_looptype()
+    test_if()
+    test_bind()
+    test_math_intrin()
+