[NNVM][TEST] Test against numerical grad (#1505)

* [NNVM][TEST] Numerical gradient testing * [NNVM][TEST] Make some tests a little faster * Fix the failing test_top_level3 * Target exclusion for the check_function * Try to ignore singularities * grad_input_vars now can't contain shapes * Don't pass unnecessary grad_input_vars to check_function * Multiple outputs; fixes; testing of check_function * Use numerical_grads_params to pass parameters to numgrad checker * Fail when no action is requested excplicitly * Pass additional params to functions * Silence the linter issue * Simplified numgrad checking * Improved docs for check_function * Fixed the error message when no dtype is provided * Several fixes * Tests with shape/dtype inference for inputs * Don't check dense's grads on cuda * Raise an error if output dtypes haven't been inferred * Moved shape/dtype inference into a separate function; use float32 as fallback * Remove redundant dtype=float32 * Fix multiple outputs * Use check_function in the rest of the test_top_level1

[NNVM][TEST] Test against numerical grad (#1505)
* [NNVM][TEST] Numerical gradient testing * [NNVM][TEST] Make some tests a little faster * Fix the failing test_top_level3 * Target exclusion for the check_function * Try to ignore singularities * grad_input_vars now can't contain shapes * Don't pass unnecessary grad_input_vars to check_function * Multiple outputs; fixes; testing of check_function * Use numerical_grads_params to pass parameters to numgrad checker * Fail when no action is requested excplicitly * Pass additional params to functions * Silence the linter issue * Simplified numgrad checking * Improved docs for check_function * Fixed the error message when no dtype is provided * Several fixes * Tests with shape/dtype inference for inputs * Don't check dense's grads on cuda * Raise an error if output dtypes haven't been inferred * Moved shape/dtype inference into a separate function; use float32 as fallback * Remove redundant dtype=float32 * Fix multiple outputs * Use check_function in the rest of the test_top_level1
0edf87e8 · Sergei Grechanik · Yizhi Liu · 6eecec92 · 0edf87e8 · 0edf87e8
Commit 0edf87e8 authored Aug 24, 2018 by Sergei Grechanik Committed by Yizhi Liu Aug 23, 2018
8 changed files
--- a/docs/api/python/nnvm/index.rst
+++ b/docs/api/python/nnvm/index.rst
@@ -11,3 +11,4 @@ This document contains the python API to NNVM compiler toolchain.
   symbol
   graph
   top
+   testing
--- a/docs/api/python/nnvm/testing.rst
+++ b/docs/api/python/nnvm/testing.rst
+nnvm.testing
+------------
+
+.. automodule:: nnvm.testing
+
+.. autofunction:: nnvm.testing.ctx_list
+
+nnvm.testing.check_computation
+------------------------------
+
+.. automodule:: nnvm.testing.check_computation
+    :members:
+
+.. include:: testing_new_ops.rst
--- a/docs/api/python/nnvm/testing_new_ops.rst
+++ b/docs/api/python/nnvm/testing_new_ops.rst
+Testing new operations
+----------------------
+
+When adding new operations, it is a good idea to test them. Testing
+should be done with the function ``nnvm.testing.check_function``. You
+should provide it with the symbol representing the result of a
+computation and a reference numpy implementation. By default, it will
+also check analytical gradients against numerical gradients if
+analytical gradients are implemented for your operation. You can also
+pass a reference implementation for the gradients, but numerical
+gradients will still be checked. Numerical gradient checking may be
+switched off explicitly, but doing this is not a good idea generally.
+Here is an example testing the logarithm operation:
+
+.. code:: python
+
+    import numpy as np
+    import nnvm
+    import nnvm.symbol as sym
+    from nnvm.testing.check_computation import check_function
+
+    x = sym.Variable("x")
+    y = sym.log(x)
+
+    def forward(x):
+        return np.log(x)
+
+    def backward(head_grads, x):
+        return [1. / x * head_grads]
+
+    dtype = "float32"
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, in_range=(0.001, 2.0), dtype=dtype, shape=shape)
+
+If you run the code above, you might get an ``AssertionError`` in rare
+cases. That’s why it is recommended to run new tests a lot of times.
+
+.. code:: python
+
+    for _ in range(10000):
+        check_function(y, forward, backward, in_range=(0.001, 2.0), dtype=dtype, shape=shape)
+
+If you run the code above then sooner or later you will get an exception
+which may look like this:
+
+.. code-block:: text
+
+    AssertionError: Analytical and numerical grads wrt x differ too much
+    analytical grad = [
+            ...
+        ]
+    numerical grad = [
+            ...
+        ]
+    distance > atol*sqrt(n) + rtol*grad_norm
+    distance 308.50885009765625 > 0.01*55.42562584220407 + 0.1*2167.70703125
+
+It means that either you have a mistake in the ``FGradient`` function or
+the numerical error is too high. Generally, if you look at the printed
+gradients and see that they differ only slightly or just in a single
+position, then it is a numerical error. But if the gradients look
+completely different, especially if many corresponding positions have
+different signs, then it must be something wrong with the analytical
+gradient implementation.
+
+Then try to make this error reproducible, and also try to reduce the
+shape of inputs, but not too much, a vector of 10 elements is a
+reasonable choice. Also you won’t need reference functions ``forward``
+and ``backward``, and restricting the number of targets might also be a
+good idea. Since the error may manifest itself only in rare cases, you
+might want to run it in a loop.
+
+.. code:: python
+
+    shape = {'x': (10,)}
+    np.random.seed(42)
+
+    for _ in range(1000):
+        check_function(y, in_range=(0.001, 2.0), dtype=dtype, shape=shape,
+                       numerical_grads=True, only_targets=['llvm'])
+
+Running this code will result in the following:
+
+.. code-block:: text
+
+    check_function failed while checking gradients numerically, here is the main graph
+    Graph(%x, %head_grads_0) {
+      %x, shape=[10], dtype=0
+      %head_grads_0, shape=[10], dtype=0
+      %1 = log(%x), shape=[10], dtype=0
+      %3 = elemwise_div(%head_grads_0, %x), shape=[10], dtype=0
+      ret %1, %3, %head_grads_0
+    }
+    graph_attr_keys = [layout_inputs, dtype_num_unknown_nodes, dtype, shape_num_unknown_nodes, shape]
+
+    Generated inputs:
+    {'x': array([2.5660574e-01, 1.5313280e+00, 1.0232578e-03, 8.3371508e-01,
+           1.0454979e+00, 1.1021420e-01, 1.9461832e+00, 4.5302454e-01,
+           6.0909325e-01, 6.0858107e-01], dtype=float32), 'head_grads_0': array([0.4616029 , 0.00394617, 1.4589603 , 1.9337242 , 0.44936267,
+           1.3264314 , 1.4840508 , 1.6970023 , 0.84583575, 0.60655886],
+          dtype=float32)}
+
+    ...
+
+    AssertionError: Analytical and numerical grads wrt x differ too much
+    analytical grad = [1.7988799e+00 2.5769596e-03 1.4257993e+03 2.3194065e+00 4.2980734e-01
+     1.2035031e+01 7.6254421e-01 3.7459390e+00 1.3886802e+00 9.9667716e-01]
+     numerical grad = [1.7948151e+00 1.9073486e-03 9.9268610e+02 2.3174286e+00 4.2915344e-01
+     1.1980057e+01 7.6198578e-01 3.7412643e+00 1.3866425e+00 9.9563599e-01]
+    distance > atol*sqrt(n) + rtol*grad_norm
+    distance 433.11322021484375 > 0.01*3.1622776601683795 + 0.1*992.7716674804688
+
+In this case the largest difference is in the 2nd position (starting
+from 0) which corresponds to input value ``1.0232578e-03``. This value
+is too close to the singularity, so the numerical derivative gets too
+imprecise. The solution is to shrink the range for ``x``, here, for
+example, ``(0.002, 2.0)`` turned out to be enough. Don’t forget to run
+lots of tests, so that other people don’t get false positives.
+
+.. code:: python
+
+    for _ in range(100):
+        check_function(y, in_range={x: (0.002, 2.0)}, dtype=dtype, shape=(1, 3, 32, 32),
+                       numerical_grads=True, only_targets=['llvm'])
+
+If you need a more precise control over which values get passed to the
+checking function, you can use ``values={x: ...}``:
+
+.. code:: python
+
+    x_val = np.array([1.2594858e+00, 1.0960974e-01, 1.4975418e+00, 6.3585603e-01,
+           1.2692513e-03, 1.0227472e+00, 9.4656967e-02, 5.5306298e-01,
+           1.4142460e+00, 1.2631655e-01], dtype=np.float32)
+    check_function(y, values={x: x_val}, dtype=dtype, shape=shape,
+                   numerical_grads=True, only_targets=['llvm'])
--- a/nnvm/python/nnvm/testing/__init__.py
+++ b/nnvm/python/nnvm/testing/__init__.py
@@ -13,3 +13,4 @@ from . import inception_v3
 from . import dcgan
 from . import dqn
 from . import yolo2_detection
+from . import check_computation
--- a/nnvm/python/nnvm/testing/check_computation.py
+++ b/nnvm/python/nnvm/testing/check_computation.py
+# pylint: disable=cell-var-from-loop,no-else-return
+"""Helper utilities to check functions and their gradients."""
+from __future__ import absolute_import as _abs
+
+import logging
+import numpy as np
+
+import tvm
+from tvm.contrib import graph_runtime
+
+import nnvm
+from nnvm.compiler import graph_util
+from nnvm.compiler.graph_attr import TCODE_TO_DTYPE, DTYPE_TO_TCODE
+from .config import ctx_list
+
+def infer_shapes_dtypes(graph, shape=None, dtype=None, fallback_dtype=None):
+    """Runs dtype and shape inference passes on a graph and returns the resulting graph
+    along with the inferred information.
+
+    Parameters
+    ----------
+    graph : nnvm.graph.Graph
+        A graph we want to run inference on.
+
+    shape : Dict[str, Tuple[int]] or Tuple[int], optional
+        A dict mapping input variable names to shapes.
+        By default shapes will be inferred from variables' attributes.
+        Note that this parameter takes precedence over variables' attributes.
+
+    dtype : Dict[str, str] or str, optional
+        A dict mapping input variable names to dtypes, or just a single dtype.
+        By default dtypes will be inferred from variables' attributes.
+        Note that this parameter takes precedence over variables' attributes.
+
+    fallback_dtype : str, optional
+        A dtype that will be used for variables whose dtype can't be inferred from other
+        variables' dtypes.
+
+    Returns
+    -------
+    graph : nnvm.graph.Graph
+        The resulting graph with dtype and shape information on its nodes.
+
+    input_shapes : Dict[str, Tuple[int]]
+        The inferred shapes of input variables merged with the `shape` dictionary.
+
+    input_dtypes : Dict[str, str]
+        The inferred dtypes of input variables merged with the `dtype` dictionary.
+
+    output_shapes : List[Tuple[int]]
+        The inferred shapes of outputs.
+
+    output_dtypes : List[str]
+        The inferred dtypes of outputs.
+    """
+    # Preprocess input parameters
+    if shape is None:
+        shape = {}
+
+    if dtype is None:
+        dtype = {}
+
+    if not isinstance(shape, dict):
+        shape = {x: shape for x in graph.symbol.list_input_variables()}
+
+    if not isinstance(dtype, dict):
+        dtype = {x: dtype for x in graph.symbol.list_input_variables()}
+
+    shape = _dict_var_to_dict_str(shape)
+    dtype = _dict_var_to_dict_str(dtype)
+
+    # The graph may already contain shape and dtype info, so extract it and merge with
+    # the user-specified shapes and dtypes (use the user-specified one on contradiction)
+    all_initial_shapes = graph.json_attr('shape')
+    all_initial_dtypes = graph.json_attr('dtype')
+
+    if all_initial_shapes:
+        for x in graph.index.input_names:
+            if x not in shape:
+                x_shape = tuple(all_initial_shapes[graph.index.entry_id(x)])
+                shape[x] = x_shape
+
+    if all_initial_dtypes:
+        for x in graph.index.input_names:
+            if x not in dtype:
+                x_dtype = TCODE_TO_DTYPE[all_initial_dtypes[graph.index.entry_id(x)]]
+                dtype[x] = x_dtype
+
+    # Perform inference
+    nnvm.compiler.graph_attr.set_shape_inputs(graph, shape)
+    nnvm.compiler.graph_attr.set_dtype_inputs(graph, dtype)
+
+    graph = graph.apply('InferShape').apply('InferType')
+
+    shapes = graph.json_attr('shape')
+    dtypes = graph.json_attr('dtype')
+
+    out_len = len(graph.symbol.list_output_names())
+
+    index = graph.index
+
+    output_shapes = \
+        [tuple(shapes[index.entry_id(index.output_entries[i])]) for i in range(out_len)]
+    output_dtypes = \
+        [TCODE_TO_DTYPE[dtypes[index.entry_id(index.output_entries[i])]] for i in range(out_len)]
+
+    # Postprocess the results
+    input_shapes = shape.copy()
+    input_dtypes = dtype.copy()
+
+    for x in graph.symbol.list_input_variables():
+        x_name = x.attr('name')
+        x_node_id = graph.index.node_id(x_name)
+        input_shapes[x_name] = tuple(shapes[x_node_id])
+        input_dtypes[x_name] = TCODE_TO_DTYPE[dtypes[x_node_id]]
+
+    # Merge the original user-specified shapes in case some of them are specified for non-existing
+    # variables
+    for x_name, x_shape in shape.items():
+        x_shape = tuple(x_shape)
+        if input_shapes.get(x_name, x_shape) != x_shape:
+            raise RuntimeError("Inferred shape differs from the provided shape.\n"
+                               "Provided shapes: {}\nInferred shapes: {}"
+                               .format(shapes, input_shapes))
+        else:
+            input_shapes[x_name] = x_shape
+
+    # Merge the original user-specified dtypes
+    for x_name, x_dtype in dtype.items():
+        if not isinstance(x_dtype, str):
+            x_dtype = TCODE_TO_DTYPE[x_dtype]
+        if input_dtypes.get(x_name, x_dtype) != x_dtype:
+            raise RuntimeError("Inferred dtype differs from the provided dtype.\n"
+                               "Provided dtypes: {}\nInferred dtypes: {}"
+                               .format(dtypes, input_dtypes))
+        else:
+            input_dtypes[x_name] = x_dtype
+
+    # If some dtypes weren't inferred and there is a fallback dtype, assign it to those varibles
+    # and repeat the inference
+    if fallback_dtype is not None and not all(input_dtypes.values()):
+        input_dtypes = {x: input_dtypes[x] if input_dtypes[x] else fallback_dtype
+                        for x in input_dtypes}
+        return infer_shapes_dtypes(graph, input_shapes, input_dtypes, fallback_dtype=None)
+
+    return graph, input_shapes, input_dtypes, output_shapes, output_dtypes
+
+def graph_to_function(graph, target, ctx, shape=None, dtype=None):
+    """Convert a graph to a function taking a keyword args and returning a list of results
+    (both args and results are numpy arrays).
+
+    Example::
+
+        fun = graph_to_function(graph, llvm, cpu(0))
+        [res1, res2] = fun(x=np.zeros((1,2)), y=np.zeros((1,)))
+
+    Parameters
+    ----------
+    graph : nnvm.graph.Graph
+        A graph we want to convert to a function.
+
+    target : str or :any:`tvm.target.Target`
+        The build target
+
+    ctx : TVMContext
+        The context to deploy the module.
+
+    shape : Dict[str, Tuple[int]], optional
+        A dict mapping input variable names to shapes.
+        By default shapes will be inferred from variables' attributes.
+        Note that this parameter takes precedence over variables' attributes.
+
+    dtype : Dict[str, str] or str, optional
+        A dict mapping input variable names to dtypes, or just a single dtype.
+        By default dtypes will be inferred from variables' attributes.
+        Note that this parameter takes precedence over variables' attributes.
+
+    Returns
+    -------
+    function : Callable[..., List[numpy.ndarray]]
+    """
+    # Infer missing shapes and dtypes
+    graph, shape, dtype, output_shapes, output_dtypes = \
+        infer_shapes_dtypes(graph, shape=shape, dtype=dtype)
+
+    if None in dtype.values():
+        raise ValueError("Input variables with no type: {}".format(dtype))
+
+    if not all(shape.values()):
+        raise ValueError("Input variables with no shape: {}".format(shape))
+
+    compute_graph, lib, params = nnvm.compiler.build(graph, target, shape=shape, dtype=dtype)
+    module = graph_runtime.create(compute_graph, lib, ctx)
+
+    if params:
+        module.set_inputs(**params)
+
+    def run(**kwargs):
+        module.run(**kwargs)
+        res = []
+        for i, (o_shape, o_dtype) in enumerate(zip(output_shapes, output_dtypes)):
+            res.append(module.get_output(i, tvm.nd.empty(o_shape, o_dtype)).asnumpy())
+        return res
+
+    return run
+
+def _dict_var_to_dict_str(dictionary):
+    """Convert a Dict[nnvm.Symbol, T] to Dict[str, T]"""
+    if isinstance(dictionary, dict):
+        return {s.attr('name') if isinstance(s, nnvm.symbol.Symbol) else s:
+                dictionary[s] for s in dictionary}
+    else:
+        return dictionary
+
+def check_function(symbol, forward=None, backward=None, grad_input_vars=None,
+                   shape=None, dtype=None, in_range=None, values=None,
+                   exclude_targets=None, only_targets=None,
+                   additional_params=None,
+                   numerical_grads=None, numerical_grads_params=None,
+                   atol=1e-5, rtol=1e-5, quiet=False):
+    """Compute the function and/or its gradients on a random input and raise
+    an exception if the result doesn't match the reference implementation.
+
+    Parameters
+    ----------
+    symbol : nnvm.Symbol
+        A symbol representing the output.
+
+    forward : Callable[..., List[numpy.ndarray]], optional
+        A reference implementation to compare with.
+
+    backward : Callable[..., List[numpy.ndarray] or Dict[str, numpy.ndarray]], optional
+        A reference implementation of gradients. Should also accept head_grads besides
+        normal inputs which is a list of gradients of some scalar wrt the outputs or just a
+        single gradient if there are multiple outputs.
+        Should return either a dict mapping input variable names to the respective
+        gradients or a list of gradients wrt variables from grad_input_vars in
+        exactly the same order (in alphabetical order by default).
+
+    grad_input_vars : List[nnvm.Symbol or str], optional
+        A list of variables with respect to which the gradients will be computed.
+        None (default) means that all input variables will be used in an alphabetical order.
+
+    shape : Dict[nnvm.Symbol or str, Tuple[int]] or Tuple[int], optional
+        A dict mapping input variable names to shapes, or just a single shape.
+        By default shapes will be inferred from variables' attributes (see the Examples).
+        Note that this parameter takes precedence over variables' attributes.
+
+    dtype : Dict[nnvm.Symbol or str, str] or str, optional
+        A dict mapping input variable names to dtypes, or just a single dtype.
+        By default dtypes will be inferred from variables' attributes (see the Examples).
+        If dtypes cannot be inferred for some variables then float32 will be used as a fallback.
+        Note that this parameter takes precedence over variables' attributes.
+
+    in_range : Dict[nnvm.Symbol or str, (float, float)] or (float, float), optional
+        A dict mapping input variable names to ranges or just a single range
+        (the same for all variables). Input values will be generated from
+        uniform distributions on these ranges. `head_grads` can also be
+        assigned a range this way.
+
+    values : Dict[nnvm.Symbol or str, numpy.ndarray], optional
+        A dict explicitly providing values for some variables instead of random generation.
+
+    exclude_targets : Set[str], optional
+        Skip compiling and running anything for these targets.
+
+    only_targets : Set[str], optional
+        Test only for those targets from `ctx_list()` that are also in this set.
+
+    additional_params : dict, optional
+        A dict of additional parameters which will be passed to forward and backward.
+
+    numerical_grads : bool or 'if_possible', optional
+        Whether to additionally check against numerically computed gradients. If 'if_possible' or
+        None is passed (which is the default) then it will try to create a gradient computation
+        graph and then check gradients numerically only if this graph can be created (i.e. if there
+        are some operations with unimplemented gradients, it will just issue a warning).
+        Checking against numerical gradients is done via the `check_numerical_grads` function.
+
+    numerical_grads_params : dict, optional
+        Additional parameters for `check_numerical_grads`.
+
+    atol : float, optional
+        Absolute tolerance for `np.testing.assert_allclose`. NOT used for numerical gradients.
+
+    rtol : float, optional
+        Relative tolerance for `np.testing.assert_allclose`. NOT used for numerical gradients.
+
+    quiet : bool, optional
+        Don't dump additional information to stdout on failure.
+
+    Examples
+    --------
+    .. code-block:: python
+
+        x = sym.Variable("x", shape=(1, 2))
+        y = sym.Variable("y", shape=(1, 2))
+
+        # check the function and its gradients both numerically and using a reference function
+        check_function(x + 2*y,
+                       lambda x, y: x + 2*y,
+                       lambda x, y, head_grads: {'x': head_grads, 'y': 2*head_grads})
+
+        # just check gradients numerically
+        check_function(x + 2*y, numerical_grads=True)
+
+        # just check the forward computation
+        check_function(x + 2*y, lambda x, y: x + 2*y, numerical_grads=False)
+
+        # specifying dtype
+        check_function(x + 2*y, lambda x, y: x + 2*y, dtype='float64')
+
+        # dtypes can also be specified during variable creation with dtype codes
+        x = sym.Variable("x", dtype=0)
+        check_function(x + 1, shape=(2, 2), numerical_grads=True)
+    """
+    # validate and preprocess the input params
+    if numerical_grads is None and forward is None and backward is None:
+        raise ValueError("No reference function was passed to check_function. If you only want to "
+                         "check gradients numerically, pass numerical_grads=True explicitly.")
+
+    if numerical_grads is None:
+        numerical_grads = 'if_possible'
+
+    if numerical_grads not in [False, True, 'if_possible']:
+        raise ValueError("numerical_grads must be a bool or 'if_possible', not {}"
+                         .format(numerical_grads))
+
+    if additional_params is None:
+        additional_params = {}
+
+    input_vars = symbol.list_input_variables()
+    input_dict = {x.attr('name'): x for x in input_vars}
+
+    if grad_input_vars is None:
+        grad_input_vars = sorted(input_vars, key=lambda x: x.attr('name'))
+    else:
+        grad_input_vars = [input_dict[x] if isinstance(x, str) else x for x in grad_input_vars]
+
+    in_range = _dict_var_to_dict_str(in_range)
+    values = _dict_var_to_dict_str(values)
+
+    out_len = len(symbol.list_output_names())
+
+    # Infer the output shapes and dtypes, and preprocess the shape and dtype params
+    forward_graph, shape, dtype, out_shapes, out_dtypes = \
+        infer_shapes_dtypes(nnvm.graph.create(symbol), shape=shape, dtype=dtype,
+                            fallback_dtype='float32')
+
+    if not all(out_shapes) or not all(out_dtypes):
+        if not quiet:
+            print(forward_graph.ir(join_node_attrs=['shape', 'dtype']))
+        raise ValueError("Could not infer shapes or dtypes for outputs.\n"
+                         "out_shapes = {}\nout_dtypes = {}".format(out_shapes, out_dtypes))
+
+    backward_graph = None
+
+    # If we want gradients, we have to recreate the graph, but now with gradient computations
+    # Note that here we need out_shapes for defining the shape of head grads, so we have to
+    # create the graph twice
+    if backward is not None or numerical_grads:
+        try:
+            head_grads_symbols = [nnvm.symbol.Variable("head_grads_" + str(i),
+                                                       shape=out_shapes[i],
+                                                       dtype=DTYPE_TO_TCODE[out_dtypes[i]])
+                                  for i in range(out_len)]
+            grad_symbols = graph_util.gradients([symbol], grad_input_vars,
+                                                grad_ys=head_grads_symbols)
+            # Sometimes grads do not depend on head_grads, so head_grads does not appear
+            # in the variable list; adding it manually prevents this, making things a bit easier
+            backward_graph = \
+                nnvm.graph.create(nnvm.symbol.Group([symbol] + grad_symbols + head_grads_symbols))
+
+            backward_graph, shape, dtype, out_shapes, out_dtypes = \
+                infer_shapes_dtypes(backward_graph, shape=shape, dtype=dtype,
+                                    fallback_dtype='float32')
+        except nnvm._base.NNVMError as err:
+            if backward is None and numerical_grads == "if_possible":
+                logging.warning("Won't check gradients because: %s", str(err).split('\n', 1)[0])
+                numerical_grads = False
+                backward_graph = None
+            else:
+                raise
+
+    main_graph = backward_graph if backward_graph is not None else forward_graph
+
+    # Generate random data for inputs (including head_grads)
+
+    np_inputs = {}
+
+    for x in main_graph.symbol.list_input_variables():
+        x_name = x.attr('name')
+        x_shape = shape[x_name]
+        x_dtype = dtype[x_name]
+
+        if values is not None and x_name in values:
+            np_inputs[x_name] = values[x_name].astype(x_dtype)
+            continue
+
+        low = -1.0
+        high = 1.0
+        if in_range is not None:
+            if isinstance(in_range, dict):
+                if x_name in in_range:
+                    low = in_range[x_name][0]
+                    high = in_range[x_name][1]
+            else:
+                low = in_range[0]
+                high = in_range[1]
+
+        np_inputs[x_name] = np.random.uniform(size=x_shape, low=low, high=high).astype(x_dtype)
+
+    np_inputs_without_head_grads = {k: np_inputs[k] for k in np_inputs
+                                    if not k.startswith('head_grads_')}
+
+    nothing_was_done = True
+
+    # Compute and compare the results
+    for target, ctx in ctx_list():
+        if exclude_targets is not None:
+            if target in exclude_targets or str(target) in exclude_targets:
+                logging.info("Skipping target = %s, ctx = %s", target, ctx)
+                continue
+        if only_targets is not None:
+            if target not in only_targets and str(target) not in only_targets:
+                logging.info("Skipping target = %s, ctx = %s", target, ctx)
+                continue
+
+        logging.info("Checking computation on target = %s, ctx = %s", target, ctx)
+
+        debug_stage = None
+
+        try:
+            nnvm_res = None
+
+            debug_stage = "compiling"
+            main_function = graph_to_function(main_graph, target, ctx)
+
+            # nnvm_res contains the output and gradients (if they are needed)
+            debug_stage = "running"
+            nnvm_res = main_function(**np_inputs)
+
+            if backward_graph is not None:
+                grad_var_names = [x.attr('name') for x in grad_input_vars]
+                nnvm_grads = {x: v for x, v in zip(grad_var_names, nnvm_res[out_len:])}
+
+            if forward is not None:
+                nothing_was_done = False
+                debug_stage = "checking forward computation"
+                logging.debug(debug_stage)
+
+                params = {}
+                params.update(np_inputs_without_head_grads)
+                params.update(additional_params)
+                numpy_res = forward(**params)
+
+                if isinstance(numpy_res, tuple):
+                    numpy_res = list(numpy_res)
+
+                if not isinstance(numpy_res, list):
+                    numpy_res = [numpy_res]
+
+                if len(numpy_res) != out_len:
+                    raise ValueError("Forward function returned {} values, but "
+                                     "the nnvm graph returns {} values"
+                                     .format(len(numpy_res), out_len))
+
+                for i in range(out_len):
+                    np.testing.assert_allclose(nnvm_res[i], numpy_res[i], atol=atol, rtol=rtol)
+
+            if backward is not None:
+                nothing_was_done = False
+                debug_stage = "checking gradients"
+                logging.debug(debug_stage)
+
+                np_head_grads = [np_inputs["head_grads_" + str(i)] for i in range(out_len)]
+
+                if out_len == 1:
+                    np_head_grads = np_head_grads[0]
+
+                params = {'head_grads': np_head_grads}
+                params.update(np_inputs_without_head_grads)
+                params.update(additional_params)
+                numpy_grads = backward(**params)
+
+                if not isinstance(numpy_grads, dict):
+                    if isinstance(numpy_grads, tuple):
+                        numpy_grads = list(numpy_grads)
+                    if not isinstance(numpy_grads, list):
+                        numpy_grads = [numpy_grads]
+                    numpy_grads = {x: v for x, v in zip(grad_var_names, numpy_grads)}
+                    if len(numpy_grads) != len(grad_var_names):
+                        raise ValueError("The backward function returns a list of gradients which "
+                                         "does not contain gradients for these variables: {}"
+                                         .format(set(grad_var_names) - set(numpy_grads)))
+
+                for x_name in numpy_grads:
+                    np.testing.assert_allclose(nnvm_grads[x_name], numpy_grads[x_name],
+                                               atol=atol, rtol=rtol)
+
+            if numerical_grads:
+                nothing_was_done = False
+                debug_stage = "checking gradients numerically"
+                logging.debug(debug_stage)
+
+                forward_function = graph_to_function(forward_graph, target, ctx)
+
+                # Since the result may be non-scalar, we have to put another operation on the top,
+                # so we just multiple by the randomly generated head_grads and then sum everything.
+                # This way we can reuse the gradient values which has been already computed.
+                def scalar_function(**kwargs):
+                    res = forward_function(**kwargs)
+                    return np.sum([np.dot(np_inputs['head_grads_' + str(i)].ravel(), res[i].ravel())
+                                   for i in range(out_len)])
+
+                if numerical_grads_params is None:
+                    numerical_grads_params = {}
+
+                check_numerical_grads(
+                    scalar_function,
+                    input_values=np_inputs_without_head_grads,
+                    grad_values=nnvm_grads,
+                    **numerical_grads_params)
+
+        except:
+            if not quiet:
+                print("\ncheck_function failed while {}, here is the main graph"
+                      .format(debug_stage))
+                print(main_graph.ir(join_node_attrs=['shape', 'dtype']))
+                if nnvm_res is not None:
+                    print("Generated inputs:")
+                    print(np_inputs)
+                    print()
+            raise
+
+    if nothing_was_done:
+        logging.warning("Nothing was done in check_function. Check ctx_list().")
+
+
+def check_numerical_grads(function, input_values, grad_values, function_value=None,
+                          delta=1e-3, atol=1e-2, rtol=0.1):
+    """A helper function that checks that numerical gradients of a function are equal to
+    gradients computed in some different way (analytical gradients).
+
+    Numerical gradients are computed using finite difference approximation. To reduce the number of
+    function evaluations, the number of points used is gradually increased if the error value is
+    too high (up to 5 points).
+
+    Parameters
+    ----------
+    function
+        A function that takes inputs as keyword arguments (like `function(**input_values)`) and
+        returns a scalar result. Should accept numpy ndarrays.
+
+    input_values : Dict[str, numpy.ndarray]
+        A dict assigning values to variables. Represents the point at which gradients should be
+        computed.
+
+    grad_values : Dict[str, numpy.ndarray]
+        Gradients computed using a different method.
+
+    function_value : float, optional
+        Should be equal to `function(**input_values)`.
+
+    delta : float, optional
+        A small number used for numerical computation of partial derivatives. The default 1e-3 is a
+        good choice for float32.
+
+    atol : float, optional
+        Absolute tolerance.
+
+    rtol : float, optional
+        Relative tolerance.
+    """
+
+    if function_value is None:
+        function_value = function(**input_values)
+
+    # a helper to modify j-th element of val by a_delta
+    def modify(val, j, a_delta):
+        val = val.copy()
+        val.reshape(-1)[j] = val.reshape(-1)[j] + a_delta
+        return val
+
+    # numerically compute a partial derivative with respect to j-th element of the var `name`
+    def derivative(x_name, j, a_delta):
+        modified_values = {n: modify(val, j, a_delta) if n == x_name else val
+                           for n, val in input_values.items()}
+        return (function(**modified_values) - function_value)/a_delta
+
+    def compare_derivative(j, n_der, grad):
+        der = grad.reshape(-1)[j]
+        return np.abs(n_der - der) < atol + rtol*np.abs(n_der)
+
+    for x_name, grad in grad_values.items():
+        if grad.shape != input_values[x_name].shape:
+            raise AssertionError(
+                "Gradient wrt '{}' has unexpected shape {}, expected {} "
+                .format(x_name, grad.shape, input_values[x_name].shape))
+
+        ngrad = np.zeros_like(grad)
+
+        # compute partial derivatives for each position in this variable
+        for j in range(np.prod(grad.shape)):
+            # forward difference approximation
+            nder = derivative(x_name, j, delta)
+
+            # if the derivative is not equal to the analytical one, try to use more
+            # precise and expensive methods
+            if not compare_derivative(j, nder, grad):
+                # central difference approximation
+                nder = (derivative(x_name, j, -delta) + nder)/2
+
+                if not compare_derivative(j, nder, grad):
+                    # central difference approximation using h = delta/2
+                    cnder2 = (derivative(x_name, j, delta/2) + derivative(x_name, j, -delta/2))/2
+                    # five-point derivative
+                    nder = (4*cnder2 - nder)/3
+
+            ngrad.reshape(-1)[j] = nder
+
+        dist = np.sqrt(np.sum((ngrad - grad)**2))
+        grad_norm = np.sqrt(np.sum(ngrad**2))
+
+        # we multiple atol by this number to make it more universal for different sizes
+        sqrt_n = np.sqrt(float(np.prod(grad.shape)))
+
+        if dist > atol*sqrt_n + rtol*grad_norm:
+            raise AssertionError(
+                "Analytical and numerical grads wrt {} differ too much\n"
+                "analytical grad = {}\n numerical grad = {}\n"
+                "distance > atol*sqrt(n) + rtol*grad_norm\n"
+                "distance {} > {}*{} + {}*{}"
+                .format(x_name, grad, ngrad,
+                        dist, atol, sqrt_n, rtol, grad_norm))
+
+        max_diff = np.max(np.abs(ngrad - grad))
+        avg_diff = np.mean(np.abs(ngrad - grad))
+        logging.info("Numerical grad test wrt %s of shape %s passes, "
+                     "dist = %f, max_diff = %f, avg_diff = %f",
+                     x_name, grad.shape, dist, max_diff, avg_diff)
--- a/nnvm/tests/python/compiler/test_top_level1.py
+++ b/nnvm/tests/python/compiler/test_top_level1.py
@@ -5,49 +5,162 @@ import topi.testing
 import nnvm.symbol as sym
 import nnvm.compiler
 from nnvm.testing.config import ctx_list
+from nnvm.testing.check_computation import check_function

-def helper(symbol, inputs, dtype,
-           np_forward, np_backward=None,
-           need_input=True, need_head_grads=True,
-           rnd_min=-1, rnd_max=1):
-    ishapes = {}
-    itypes = {}
-    input_syms = []
-    np_inputs = {}
-    for (name, shape, s) in inputs:
-        ishapes.update({name: shape})
-        itypes.update({name: dtype})
-        np_inputs.update({name: np.random.uniform(rnd_min, rnd_max, size=shape).astype(dtype)})
-        input_syms.append(s)
-
-    for target, ctx in ctx_list():
-        graph, lib, _ = nnvm.compiler.build(symbol, target, ishapes, itypes)
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(**np_inputs)
-        y_np = np_forward(**np_inputs)
-        out = m.get_output(0, tvm.nd.empty(y_np.shape, dtype))
-        np.testing.assert_allclose(out.asnumpy(), y_np, atol=1e-5, rtol=1e-5)
-        # backward
-        if np_backward:
-            graph._set_symbol_list_attr("grad_ys", symbol)
-            graph._set_symbol_list_attr("grad_xs", input_syms)
-            graph._set_symbol_list_attr("grad_ys_out_grad", sym.Variable("head_grads", shape=y_np.shape))
-            graph = graph.apply("Gradient")
-            ishapes.update({"head_grads": y_np.shape})
-            graph, lib, _ = nnvm.compiler.build(graph, target, ishapes)
-            m = graph_runtime.create(graph, lib, ctx)
-            head_grads = np.random.uniform(size=y_np.shape).astype(dtype)
-            y_np = np_backward(head_grads=head_grads, **np_inputs)
-            b_inputs = {}
-            if need_input:
-                b_inputs.update(np_inputs)
-            if need_head_grads:
-                b_inputs.update({"head_grads":head_grads})
-            m.run(**b_inputs)
-            for i in range(len(y_np)):
-                out = m.get_output(i, tvm.nd.empty(y_np[i].shape, dtype))
-                np.testing.assert_allclose(out.asnumpy(), y_np[i], atol=1e-5, rtol=1e-5)
+def test_check_function():
+    # test the testing function

+    x = sym.Variable("x")
+    y = sym.Variable("y")
+
+    # different styles of returning gradients from the backward function
+    check_function(x + 2*y, lambda x, y: x + 2*y,
+                   lambda x, y, head_grads: [head_grads, 2*head_grads],
+                   shape={'x': (1, 2), y: (1, 2)}, dtype='float32')
+    check_function(x + 2*y, lambda x, y: x + 2*y,
+                   lambda x, y, head_grads: (head_grads, 2*head_grads),
+                   shape={'x': (1, 2), y: (1, 2)}, dtype='float32')
+    check_function(x + 2*y, lambda x, y: x + 2*y,
+                   lambda x, y, head_grads: {'x': head_grads, 'y': 2*head_grads},
+                   shape={'x': (1, 2), y: (1, 2)}, dtype='float32')
+    check_function(x + 2*y, lambda x, y: x + 2*y,
+                   lambda x, y, head_grads: {'y': 2*head_grads},
+                   shape={'x': (1, 2), y: (1, 2)}, dtype='float32')
+    check_function(x + 2*y, lambda x, y: x + 2*y,
+                   lambda x, y, head_grads: [2*head_grads],
+                   grad_input_vars=[y],
+                   shape={'x': (1, 2), y: (1, 2)}, dtype='float32')
+    check_function(x + 2*y, lambda x, y: x + 2*y,
+                   lambda x, y, head_grads: 2*head_grads,
+                   grad_input_vars=[y],
+                   shape={'x': (1, 2), y: (1, 2)}, dtype='float32')
+    check_function(x + 2*y, lambda x, y: x + 2*y,
+                   lambda x, y, head_grads: 2*head_grads,
+                   grad_input_vars=[y],
+                   shape={'x': (1, 2), y: (1, 2)}, dtype='float64')
+
+    # test just numerical gradients
+    # different styles of shape and dtype passing
+    check_function(x + 2*y, shape={'x': (1, 2), y: (1, 2)},
+                   numerical_grads=True)
+    check_function(x + 2*y, shape={'x': (1, 2), y: (1, 2)}, dtype='float32',
+                   numerical_grads=True)
+    check_function(x + 2*y, shape={'x': (1, 2), y: (1, 2)}, dtype={x: 'float32', 'y': 'float32'},
+                   numerical_grads=True)
+    check_function(x + 2*y, shape=(1, 2), dtype='float32',
+                   numerical_grads=True)
+
+    # specifying variable attributes on variable creation
+    # (in this case type codes must be used)
+    x = sym.Variable("x", dtype=0, shape=(1, 2))
+    check_function(x + 2*y, shape={y: (1, 2)}, dtype={'y': 'float32'}, numerical_grads=True)
+    y = sym.Variable("y", dtype=0, shape=(1, 2))
+
+    # shape overriding
+    def _fwd1(x, y):
+        assert x.shape == (1, 1)
+        assert y.shape == (1, 2)
+        return x + 2*y
+    check_function(x + 2*y, _fwd1, shape={x: (1, 1)})
+
+    # in_range
+    def _fwd2(x, y):
+        assert x.shape == (100,)
+        assert (x <= 0.9).all()
+        assert (x >= 0.8).all()
+        return x + 2*y
+    check_function(x + 2*y, _fwd2, shape=(100,), in_range=(0.8, 0.9), numerical_grads=False)
+    check_function(x + 2*y, _fwd2, shape=(100,), in_range={'x': (0.8, 0.9)}, numerical_grads=False)
+    check_function(x + 2*y, backward=lambda x, y, head_grads: [1.0, 2.0],
+                   in_range={'head_grads_0': (1.0, 1.0)})
+    # explicit passing of values
+    check_function(x + 2*y, backward=lambda x, y, head_grads: [1.0, 2.0],
+                   values={'head_grads_0': np.full((1, 2), 1.0)})
+
+    # check that the function reports errors
+    def _check_function_must_fail(*args, **kwargs):
+        error = AssertionError
+        if 'error' in kwargs:
+            error = kwargs['error']
+            del kwargs['error']
+        try:
+            check_function(*args, quiet=True, **kwargs)
+        except error:
+            pass
+        else:
+            raise AssertionError("check_function didn't raise an exception")
+
+    _check_function_must_fail(x + 2*y, error=ValueError)
+    _check_function_must_fail(x + 2*y, lambda x, y: x + y)
+    _check_function_must_fail(x + 2*y, backward=lambda x, y, head_grads: [1.0, 2.0])
+    _check_function_must_fail(sym.block_grad(x + 2*y), numerical_grads=True)
+    _check_function_must_fail(x*x, numerical_grads=True,
+                              numerical_grads_params={'atol': 0.0, 'rtol': 0.0})
+
+    # different styles of returning results from the forward function
+    check_function(x + 2*y, lambda x, y: [x + 2*y], numerical_grads=False)
+    _check_function_must_fail(x + 2*y, lambda x, y: [x + 2*y, x], numerical_grads=False,
+                              error=ValueError)
+    _check_function_must_fail(x + 2*y, lambda x, y: [], numerical_grads=False,
+                              error=ValueError)
+
+    # multiple outputs
+    z = sym.Group([2*x + y, x + 2*y])
+    check_function(z, lambda x, y: [2*x + y, x + 2*y])
+    check_function(z, lambda x, y: (2*x + y, x + 2*y))
+    check_function(z, backward=lambda x, y, head_grads: [2*head_grads[0] + head_grads[1],
+                                                         head_grads[0] + 2*head_grads[1]])
+    _check_function_must_fail(z, backward=lambda x, y, head_grads: [2*head_grads[0],
+                                                                    2*head_grads[1]])
+    check_function(z, backward=lambda x, y, head_grads: [head_grads[1], 2*head_grads[1]],
+                   in_range={'head_grads_0': (0, 0)})
+    check_function(z, numerical_grads=True)
+
+    z = sym.Group([sym.block_grad(2*x + y), x + 2*y])
+    check_function(z, lambda x, y: [2*x + y, x + 2*y], numerical_grads=False)
+    _check_function_must_fail(z, lambda x, y: [2*x + y, x + 2*y])
+    _check_function_must_fail(z, numerical_grads=True)
+
+    z = sym.Group([2*x + y, sym.block_grad(x + 2*y)])
+    _check_function_must_fail(z, numerical_grads=True)
+
+    z = sym.Group([2*x + y, x + 2*y, x, y, sym.sum(x)])
+    check_function(z, lambda x, y: [2*x + y, x + 2*y, x, y, np.sum(x)])
+
+    # passing additional parameters to forward and backward
+    def _fwd3(x, p):
+        assert p == 'v'
+        return x + 1
+    def _bwd3(x, p, head_grads):
+        assert p == 'v'
+        return head_grads
+    check_function(x + 1, _fwd3, _bwd3, additional_params={'p': 'v'})
+
+    # implicitly created variables and shape/dtype inference for inputs
+    x = sym.Variable("x", shape=(2, 3), dtype=0)
+    b = sym.Variable("b")
+    y = sym.dense(data=x, bias=b, units=4)
+    # Don't check gradients on cuda because is doesn't yet support ewise after reduce
+    check_function(y, exclude_targets={'cuda'}, numerical_grads=True)
+    check_function(y, shape={'x': (3, 4)}, exclude_targets={'cuda'}, numerical_grads=True)
+    check_function(y, dtype={'x': 'float64'}, exclude_targets={'cuda'}, numerical_grads=True)
+
+    x = sym.Variable("x")
+    b = sym.Variable("b")
+    w = sym.Variable("w")
+    y = sym.dense(data=x, bias=b, weight=w, units=4)
+    def _fwd_dense(x, w, b):
+        return np.dot(x, w.T) + b
+    check_function(y, _fwd_dense, shape={'x': (1,2)}, dtype={'x': 'float32'}, numerical_grads=False)
+    check_function(y, _fwd_dense, shape={'x': (1,2)}, dtype={'w': 'float64'}, numerical_grads=False)
+    _check_function_must_fail(y, _fwd_dense, shape={'x': (1,2)},
+                              dtype={'w': 'float64', 'b': 'float32'},
+                              numerical_grads=False,
+                              error=nnvm._base.NNVMError)
+    # fails because no shape
+    _check_function_must_fail(y, _fwd_dense, numerical_grads=False, error=ValueError)
+    # ok because type is float32 by default
+    check_function(y, _fwd_dense, shape={'x': (1,2)}, numerical_grads=False)

 def test_relu():
    x = sym.Variable("x")
@@ -62,10 +175,8 @@ def test_relu():
        return [(sub > 0).astype("float") * \
                ((x > 0).astype("float") + 0.3 * (x < 0).astype("float")) * head_grads]

-    dtype = "float32"
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, shape=shape)

 def test_prelu_nchw():
    x = sym.Variable("x")
@@ -75,15 +186,8 @@ def test_prelu_nchw():
    def forward(x, a):
        return (x < 0) * (x * a.reshape(3, 1, 1)) + (x>=0) * x

-    dtype = "float32"
-    dshape_x = (1, 3, 32, 32)
-    dshape_w = (3,)
-
-    inputs = [
-        ('x', dshape_x, x),
-        ('a', dshape_w, a)
-    ]
-    helper(y, inputs, dtype, forward)
+    shape = {'x': (1, 3, 32, 32), 'a': (3,)}
+    check_function(y, forward, shape=shape)

 def test_prelu_nhwc():
    x = sym.Variable("x")
@@ -93,17 +197,8 @@ def test_prelu_nhwc():
    def forward(x, a):
        return (x < 0) * (x * a.reshape(1, 1, 3)) + (x>=0) * x

-    dtype = "float32"
-    dshape_x = (1, 32, 32, 3)
-    dshape_w = (3,)
-
-    inputs = [
-        ('x', dshape_x, x),
-        ('a', dshape_w, a)
-    ]
-
-
-    helper(y, inputs, dtype, forward)
+    shape = {'x': (1, 32, 32, 3), 'a': (3,)}
+    check_function(y, forward, shape=shape)

 def test_sym_scalar_pow():
    scalar = 3
@@ -116,10 +211,8 @@ def test_sym_scalar_pow():
    def backward(head_grads, x):
        return [scalar * x**(scalar -  1) * head_grads]

-    dtype = "float32"
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, shape=shape)


 def test_scalar_sym_pow():
@@ -133,10 +226,8 @@ def test_scalar_sym_pow():
    def backward(head_grads, x):
        return [np.log(scalar) * scalar**x * head_grads]

-    dtype = "float32"
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, shape=shape)


 def test_exp():
@@ -149,10 +240,8 @@ def test_exp():
    def backward(head_grads, x):
        return [np.exp(x) * head_grads]

-    dtype = "float32"
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, shape=shape)


 def test_log():
@@ -165,10 +254,8 @@ def test_log():
    def backward(head_grads, x):
        return [1. / x * head_grads]

-    dtype = "float32"
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward, rnd_min=0.001)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, in_range=(0.002, 2.0), shape=shape)


 def test_tanh():
@@ -182,10 +269,8 @@ def test_tanh():
        y_np = forward(x)
        return [(1 - y_np**2) * head_grads]

-    dtype = "float32"
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, shape=shape)


 def test_sigmoid():
@@ -199,10 +284,8 @@ def test_sigmoid():
        y_np = forward(x)
        return [y_np *(1 - y_np) * head_grads]

-    dtype = "float32"
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, forward, backward, shape=shape)


 def test_softmax():
@@ -217,10 +300,10 @@ def test_softmax():
        grad = y * (head_grads - np.sum(y * head_grads, axis=1, keepdims=True))
        return [grad]

-    dtype = "float32"
-    dshape = (10, 1000)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    check_function(y, forward, backward,
+                   shape={'x': (10, 1000)}, numerical_grads=False)
+    check_function(y, forward, backward,
+                   shape={'x': (2, 10)})


 def test_log_softmax():
@@ -235,10 +318,10 @@ def test_log_softmax():
        grad = head_grads - np.exp(y) * np.sum(head_grads, axis=1, keepdims=True)
        return [grad]

-    dtype = "float32"
-    dshape = (10, 1000)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    check_function(y, forward, backward,
+                   shape={'x': (10, 1000)}, numerical_grads=False)
+    check_function(y, forward, backward,
+                   shape={'x': (2, 10)})


 def test_dense():
@@ -250,13 +333,16 @@ def test_dense():

    def forward(x, dense_weight, dense_bias):
        return np.dot(x, dense_weight.T) + dense_bias
-    dtype = "float32"
-    inputs = [
-        ('x', (10, 100), x),
-        ('dense_weight', (3, 100), w),
-        ('dense_bias', (3,), b)
-    ]
-    helper(y, inputs, dtype, forward)
+    shape = {
+        'x': (10, 100),
+        'w': (3, 100),
+        'b': (3,)
+    }
+    # Don't check gradients on cuda because is doesn't yet support ewise after reduce
+    check_function(y, forward, shape=shape,
+                   exclude_targets={'cuda'}, numerical_grads=True)
+    check_function(y, forward, shape=shape,
+                   only_targets={'cuda'}, numerical_grads=False)


 def test_batchnorm():
@@ -272,35 +358,25 @@ def test_batchnorm():
    def forward(x, gamma, beta, moving_mean, moving_var):
        return (x - moving_mean) / np.sqrt(moving_var + eps) * gamma + beta

-    dtype = "float32"
-    inputs = [
-        ('x', (10, 20), x),
-        ('gamma', (20,), gamma),
-        ('beta', (20,), beta),
-        ('moving_mean', (20,), moving_var),
-        ('moving_var', (20,), moving_mean)
-    ]
+    shape = {
+        'x': (10, 20),
+        'gamma': (20,),
+        'beta': (20,),
+        'moving_mean': (20,),
+        'moving_var': (20,)
+    }

-    helper(y, inputs,  dtype, forward, rnd_min=0.001)
+    check_function(y, forward, in_range=(0.001, 1.0), shape=shape)


 def verify_concatenate(ishape, axis):
-    x = [sym.Variable("x%d" % i) for i in range(len(ishape))]
+    x = [sym.Variable("x%d" % i, shape=ishape[i]) for i in range(len(ishape))]
    y = sym.concatenate(*x, axis=axis) + 1
-    dtype = "float32"
-    for target, ctx in ctx_list():
-        # set input
-        data = []
-        for i, shape in enumerate(ishape):
-            data.append(np.random.uniform(size=shape).astype(dtype))
-        pdict = {"x%d" % i :  v for i, v in enumerate(data)}
-        shape = {"x%d" % i :  v.shape for i, v in enumerate(data)}
-        graph, lib, _ = nnvm.compiler.build(y, target, shape)
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(**pdict)
-        out_np = np.concatenate(data, axis=axis) + 1
-        out = m.get_output(0, tvm.nd.empty(out_np.shape))
-        np.testing.assert_allclose(out.asnumpy(), out_np, atol=1e-5, rtol=1e-5)
+
+    def forward(**kwargs):
+        return np.concatenate(list(kwargs.values()), axis=axis) + 1
+
+    check_function(y, forward)


 def test_concatenate():
@@ -309,19 +385,13 @@ def test_concatenate():


 def verify_split(ishape, indices_or_sections, axis):
-    x = sym.Variable("x")
+    x = sym.Variable("x", shape=ishape)
    y = sym.split(x, indices_or_sections=indices_or_sections, axis=axis)
-    dtype = "float32"
-    x_np = np.random.uniform(size=ishape).astype(dtype)
-    res = np.split(x_np, indices_or_sections, axis=axis)
-    for target, ctx in ctx_list():
-        # set input
-        graph, lib, _ = nnvm.compiler.build(y, target, {"x": ishape})
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(x=x_np)
-        for i, arr  in enumerate(res):
-            out = m.get_output(i, tvm.nd.empty(arr.shape))
-            np.testing.assert_allclose(out.asnumpy(), arr, atol=1e-5, rtol=1e-5)
+
+    def forward(x):
+        return np.split(x, indices_or_sections, axis=axis)
+
+    check_function(y, forward)


 def test_split():
@@ -331,28 +401,22 @@ def test_split():

 def verify_strided_slice(ishape, begin, end, strideinp=None):
    stride = strideinp if strideinp else [1, 1, 1]
-    x = sym.Variable("x")
+    x = sym.Variable("x", shape=ishape)
    if strideinp:
        y = sym.strided_slice(x, begin = begin, end = end, stride = stride) + 1
    else:
        y = sym.strided_slice(x, begin = begin, end = end) + 1
-    x_np = np.random.uniform(size=ishape).astype("float32")
+
    for i in range(len(begin), 3):
        begin.append(0)
    for i in range(len(end), 3):
        end.append(ishape[i])
-    def test_forward(x, begin, end, stride):
+
+    def test_forward(x):
        return x[begin[0]:end[0]:stride[0],
                    begin[1]:end[1]:stride[1], begin[2]:end[2]:stride[2]] + 1

-    for target, ctx in ctx_list():
-        # set input
-        graph, lib, _ = nnvm.compiler.build(y, target, {"x": ishape})
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(x=x_np)
-        res = test_forward(x_np, begin, end, stride)
-        out = m.get_output(0, tvm.nd.empty(res.shape))
-        np.testing.assert_allclose(out.asnumpy(), res, atol=1e-5, rtol=1e-5)
+    check_function(y, test_forward)

 def test_strided_slice():
    verify_strided_slice((3, 4, 3), [0, 0, 0], [4, -5, 4], [1, -1, 2])
@@ -369,24 +433,18 @@ def verify_take(src_shape, indices_src, axis=None):
    src_dtype = "float32"
    indices_dtype = "int32"
    indices_src = np.array(indices_src, dtype=indices_dtype)
-    a = sym.Variable("a")
-    indices = sym.Variable("indices")
+    a = sym.Variable("a", shape=src_shape)
+    indices = sym.Variable("indices", shape=indices_src.shape)
    y = sym.take(a, indices, axis=axis)
-    for target, ctx in ctx_list():
-        # set input
-        shape_dict = {"a":src_shape, "indices":indices_src.shape}
-        type_dict = {"a":src_dtype, "indices":indices_dtype}
-        graph, lib, _ = nnvm.compiler.build(y, target, shape=shape_dict, dtype=type_dict)
-        m = graph_runtime.create(graph, lib, ctx)
-
-        shape_size = 1
-        for i in range(len(src_shape)):
-            shape_size = shape_size * src_shape[i]
-        a_src = np.arange(shape_size, dtype=src_dtype).reshape((src_shape))
-        out_np = np.take(a_src, indices_src, axis=axis)
-        m.run(a=a_src, indices=indices_src)
-        out = m.get_output(0, tvm.nd.empty(out_np.shape, dtype=src_dtype))
-        np.testing.assert_allclose(out.asnumpy(), out_np, atol=1e-5, rtol=1e-5)
+
+    def forward(a, indices):
+        return np.take(a, indices=indices, axis=axis)
+
+    a_src = np.arange(np.prod(src_shape), dtype=src_dtype).reshape(src_shape)
+
+    check_function(y, forward,
+                   dtype={'a': src_dtype, 'indices': indices_dtype},
+                   values={'a': a_src, 'indices': indices_src})

 def test_take():
    verify_take((4,), [1])
@@ -399,9 +457,9 @@ def test_take():
    verify_take((4,3,5,6), [[2,1,0,0]], -2)


-def verify_squeeze(dshape, axis):
+def verify_squeeze(shape, axis):
    x = sym.Variable("x")
-    if axis:
+    if axis is not None:
        y = sym.squeeze(x, axis=axis)
    else:
        y = sym.squeeze(x)
@@ -413,9 +471,7 @@ def verify_squeeze(dshape, axis):
    def backward(head_grads, x):
        return [np.reshape(head_grads, x.shape)]

-    dtype = "float32"
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, forward, backward)
+    check_function(y, forward, backward, shape=shape)


 def test_squeeze():
@@ -433,61 +489,40 @@ def test_pad():
                      pad_width=((0, 0), (0, 0), (0, 1), (2, 3)),
                      mode='constant', constant_values=1.)

-    dtype = "float32"
-    inputs = [('x', (1, 3, 28, 28), x)]
-    helper(y, inputs, dtype, forward)
+    shape = {'x': (1, 3, 28, 28)}
+    check_function(y, forward, shape=shape)

 def verify_lrn(ishape, size, axis, bias, alpha, beta):
-    x = sym.Variable("x")
+    x = sym.Variable("x", shape=ishape)
    y = sym.lrn(x, size=size, axis=axis, bias=bias, alpha=alpha, beta=beta)
-    dtype = "float32"
-    x_np = np.random.uniform(size=ishape).astype(dtype)

-    for target, ctx in ctx_list():
-        graph, lib, _ = nnvm.compiler.build(y, target, {"x": ishape})
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(x=x_np)
-        out = m.get_output(0, tvm.nd.empty(ishape))
-        out_np = topi.testing.lrn_python(x_np, size, axis, bias, alpha, beta)
-        np.testing.assert_allclose(out.asnumpy(), out_np, atol=1e-5, rtol=1e-5)
+    def forward1(x):
+        return topi.testing.lrn_python(x, size, axis, bias, alpha, beta)
+
+    check_function(y, forward1)
+
+    def forward2(x):
+        y = forward1(x)
+        return (y > 0)*y

    #Checking LRN op followed by elementwise op relu
-    z = sym.relu(y)
-    x_np = np.random.uniform(low=-10.0, high=10.0, size=ishape).astype(dtype)
-    for target, ctx in ctx_list():
-        graph, lib, _ = nnvm.compiler.build(z, target, {"x": ishape})
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(x=x_np)
-        out = m.get_output(0, tvm.nd.empty(ishape))
-        out_np = topi.testing.lrn_python(x_np, size, axis, bias, alpha, beta)
-        out_np = (out_np > 0) * out_np
-        np.testing.assert_allclose(out.asnumpy(), out_np, atol=1e-5, rtol=1e-5)
+    check_function(sym.relu(y), forward2, in_range={'x': (-10.0, 10.0)})

 def verify_l2_normalize(ishape, eps, axis):
-    x = sym.Variable("x")
+    x = sym.Variable("x", shape=ishape)
    y = sym.l2_normalize(x, eps=eps, axis=axis)
-    dtype = "float32"
-    x_np = np.random.uniform(size=ishape).astype(dtype)

-    for target, ctx in ctx_list():
-        graph, lib, _ = nnvm.compiler.build(y, target, {"x": ishape})
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(x=x_np)
-        out = m.get_output(0, tvm.nd.empty(ishape))
-        out_np = topi.testing.l2_normalize_python(x_np, eps, axis)
-        np.testing.assert_allclose(out.asnumpy(), out_np, atol=1e-5, rtol=1e-5)
+    def forward1(x):
+        return topi.testing.l2_normalize_python(x, eps, axis)
+
+    check_function(y, forward1)
+
+    def forward2(x):
+        y = forward1(x)
+        return (y > 0)*y

    #Checking L2 normalization op followed by elementwise op relu
-    z = sym.relu(y)
-    x_np = np.random.uniform(low=-10.0, high=10.0, size=ishape).astype(dtype)
-    for target, ctx in ctx_list():
-        graph, lib, _ = nnvm.compiler.build(z, target, {"x": ishape})
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(x=x_np)
-        out = m.get_output(0, tvm.nd.empty(ishape))
-        out_np = topi.testing.l2_normalize_python(x_np, eps, axis)
-        out_np = (out_np > 0) * out_np
-        np.testing.assert_allclose(out.asnumpy(), out_np, atol=1e-5, rtol=1e-5)
+    check_function(sym.relu(y), forward2, in_range={'x': (-10.0, 10.0)})

 def test_lrn():
    verify_lrn((1, 3, 20, 20), 3, 1, 1.0, 1.0, 0.5)
@@ -498,6 +533,7 @@ def test_l2_normalize():
    verify_l2_normalize((1, 3, 20, 20), 0.001, (1, 2))

 if __name__ == "__main__":
+    test_check_function()
    test_split()
    test_concatenate()
    test_log_softmax()

--- a/nnvm/tests/python/compiler/test_top_level3.py
+++ b/nnvm/tests/python/compiler/test_top_level3.py
@@ -5,15 +5,14 @@ import topi.testing
 import nnvm.symbol as sym
 import nnvm.compiler
 from nnvm.testing.config import ctx_list
-from test_top_level1 import helper
+from nnvm.testing.check_computation import check_function

 def check_map(symfunc, np_func, np_backward=None, dtype="float32", rnd_min=-1, rnd_max=1):
    x = sym.Variable("x")
    y = symfunc(x)
-    dshape = (1, 3, 32, 32)
-    inputs = [('x', dshape, x)]
-    helper(y, inputs, dtype, lambda x: np_func(x), np_backward,
-           rnd_min=rnd_min, rnd_max=rnd_max)
+    shape = {'x': (1, 3, 32, 32)}
+    check_function(y, lambda x: np_func(x), np_backward,
+                   dtype=dtype, shape=shape, in_range=(rnd_min, rnd_max))


 def test_floor():

--- a/nnvm/tests/python/compiler/test_top_level4.py
+++ b/nnvm/tests/python/compiler/test_top_level4.py
@@ -6,52 +6,7 @@ import topi
 import nnvm.symbol as sym
 import nnvm.compiler
 from nnvm.testing.config import ctx_list
-
-
-def helper(symbol, inputs, dtype,
-           np_forward, np_backward=None,
-           need_input=True, need_head_grads=True, in_range={}):
-    ishapes = {}
-    input_syms = []
-    np_inputs = {}
-    for (name, shape, s) in inputs:
-        ishapes.update({name: shape})
-        if name in in_range:
-            np_inputs.update({name: np.random.uniform(size=shape,
-                                                      low=in_range[name][0],
-                                                      high=in_range[name][1]).astype(dtype)})
-        else:
-            np_inputs.update({name: np.random.uniform(size=shape).astype(dtype)})
-        input_syms.append(s)
-
-    for target, ctx in ctx_list():
-        graph, lib, _ = nnvm.compiler.build(symbol, target, ishapes, dtype=dtype)
-        m = graph_runtime.create(graph, lib, ctx)
-        m.run(**np_inputs)
-        y_np = np_forward(**np_inputs)
-        out = m.get_output(0, tvm.nd.empty(y_np.shape, dtype))
-        np.testing.assert_allclose(out.asnumpy(), y_np, atol=1e-5, rtol=1e-5)
-        # backward
-        if np_backward:
-            graph._set_symbol_list_attr("grad_ys", symbol)
-            graph._set_symbol_list_attr("grad_xs", input_syms)
-            graph._set_symbol_list_attr("grad_ys_out_grad", sym.Variable("head_grads", shape=y_np.shape))
-            graph = graph.apply("Gradient")
-            ishapes.update({"head_grads": y_np.shape})
-            graph, lib, _ = nnvm.compiler.build(graph, target, ishapes)
-            m = graph_runtime.create(graph, lib, ctx)
-            head_grads = np.random.uniform(size=y_np.shape).astype(dtype)
-            y_np = np_backward(head_grads=head_grads, **np_inputs)
-            b_inputs = {}
-            if need_input:
-                b_inputs.update(np_inputs)
-            if need_head_grads:
-                b_inputs.update({"head_grads":head_grads})
-            m.run(**b_inputs)
-            for i in range(len(y_np)):
-                out = m.get_output(i, tvm.nd.empty(y_np[i].shape, dtype))
-                np.testing.assert_allclose(out.asnumpy(), y_np[i], atol=1e-5, rtol=1e-5)
-
+from nnvm.testing.check_computation import check_function

 def verify_transpose(dshape, axes):
    x = sym.Variable("x")
@@ -228,93 +183,92 @@ def test_clip():
        mask2 = np.less_equal(x, a_max).astype("float")
        return [head_grads * mask1 * mask2]

-
-    dtype = "float32"
-    inputs = [('x', (3, 4, 5), x)]
-    helper(y, inputs, dtype, forward, backward)
+    shape = {'x': (3, 4, 5)}
+    check_function(y, forward, backward, shape=shape)


 def test_broadcast():
    a = sym.Variable("a")
    b = sym.Variable("b")
-    inputs = [('a', (3, 4, 5), a),
-              ('b', (1, 5), b)]
-    dtype = "float32"
+    shape = {'a': (3, 4, 5), 'b': (1, 5)}

    def _collapse(g):
-        return g.reshape(-1, inputs[-1][1][-1]).sum(0, keepdims=True)
+        return g.reshape(-1, shape['b'][-1]).sum(0, keepdims=True)

    y = sym.broadcast_add(a, b)
    def _backward_add(head_grads, a, b):
        da = head_grads
        db = _collapse(head_grads)
        return da, db
-    helper(y, inputs, dtype, lambda a, b: a + b, _backward_add)
+    check_function(y, lambda a, b: a + b, _backward_add, shape=shape)

    y = sym.broadcast_sub(a, b)
    def _backward_sub(head_grads, a, b):
        da = head_grads
        db = -_collapse(head_grads)
        return da, db
-    helper(y, inputs, dtype, lambda a, b: a - b, _backward_sub)
+    check_function(y, lambda a, b: a - b, _backward_sub, shape=shape)

    y = sym.broadcast_mul(a, b)
    def _backward_mul(head_grads, a, b):
        da = head_grads * b
        db = _collapse(head_grads * a)
        return da, db
-    helper(y, inputs, dtype, lambda a, b: a * b, _backward_mul)
+    check_function(y, lambda a, b: a * b, _backward_mul, shape=shape)

    y = sym.broadcast_div(a, b)
    def _backward_div(head_grads, a, b):
        da = head_grads / b
        db = _collapse(- head_grads * a / b**2)
        return da, db
-    helper(y, inputs, dtype, lambda a, b: a / b, _backward_div)
+    # We avoid computing numerical derivatives too close to zero here
+    check_function(y, lambda a, b: a / b, _backward_div, shape=shape, numerical_grads=False)
+    check_function(y, lambda a, b: a / b, _backward_div, shape=shape,
+                   in_range={'b': (0.1, 20)})

    y = sym.broadcast_mod(a, b)
-    helper(y, inputs, 'int32',
+    check_function(y,
                   lambda a, b: np.mod(a, b),
-           in_range={'a': (0.001, 100), 'b': (1, 100)})
+                   in_range={'a': (0.001, 100), 'b': (1, 100)}, dtype='int32', shape=shape)

    y = sym.broadcast_max(a, b)
-    helper(y, inputs, dtype, lambda a, b: np.maximum(a, b))
+    check_function(y, lambda a, b: np.maximum(a, b), shape=shape)

    y = sym.broadcast_min(a, b)
-    helper(y, inputs, dtype, lambda a, b: np.minimum(a, b))
+    check_function(y, lambda a, b: np.minimum(a, b), shape=shape)

    y = sym.broadcast_pow(a, b)
-    helper(y, inputs, dtype,
+    check_function(y,
                   lambda a, b: np.power(a, b),
-           in_range={'a': (0.001, 100), 'b': (0.001, 2)})
+                   in_range={'a': (0.001, 100), 'b': (0.001, 2)}, shape=shape)

    y = sym.broadcast_left_shift(a, b)
-    helper(y, inputs, 'int32', lambda a, b: a << b)
+    check_function(y, lambda a, b: a << b, dtype='int32', shape=shape)

    y = sym.broadcast_right_shift(a, b)
-    helper(y, inputs, 'int32', lambda a, b: a >> b)
+    check_function(y, lambda a, b: a >> b, dtype='int32', shape=shape)

    y = sym.broadcast_greater(a, b)
-    helper(y, inputs, dtype, lambda a, b: np.greater(a, b))
+    check_function(y, lambda a, b: np.greater(a, b), shape=shape)

    y = sym.broadcast_less(a, b)
-    helper(y, inputs, dtype, lambda a, b: np.less(a, b))
+    check_function(y, lambda a, b: np.less(a, b), shape=shape)

    y = sym.broadcast_equal(a, b)
-    helper(y, inputs, 'int32', lambda a, b: np.equal(a, b),
-           in_range={'a': (-2, 2), 'b': (-2, 2)})
+    check_function(y, lambda a, b: np.equal(a, b),
+                   in_range={'a': (-2, 2), 'b': (-2, 2)}, dtype='int32', shape=shape)

    y = sym.broadcast_not_equal(a, b)
-    helper(y, inputs, 'int32', lambda a, b: np.not_equal(a, b),
-           in_range={'a': (-2, 2), 'b': (-2, 2)})
+    check_function(y, lambda a, b: np.not_equal(a, b),
+                   in_range={'a': (-2, 2), 'b': (-2, 2)}, dtype='int32', shape=shape)

    y = sym.broadcast_greater_equal(a, b)
-    helper(y, inputs, 'int32', lambda a, b: np.greater_equal(a, b),
-           in_range={'a': (-3, 3), 'b': (-3, 3)})
+    check_function(y, lambda a, b: np.greater_equal(a, b),
+                   in_range={'a': (-3, 3), 'b': (-3, 3)}, dtype='int32', shape=shape)

    y = sym.broadcast_less_equal(a, b)
-    helper(y, inputs, 'int32', lambda a, b: np.less_equal(a, b),
-           in_range={'a': (-3, 3), 'b': (-3, 3)})
+    check_function(y, lambda a, b: np.less_equal(a, b),
+                   in_range={'a': (-3, 3), 'b': (-3, 3)}, dtype='int32', shape=shape)

 def test_greater():
    l = sym.Variable("l")
@@ -325,13 +279,10 @@ def test_greater():
        return np.greater(l, r).astype("float32")

    def backward(head_grads, l, r):
-        return [np.zeros_like(l)]
-
+        return {'l': np.zeros_like(l)}

-    dtype = "float32"
-    inputs = [('l', (3, 4, 5), l),
-              ('r', (3, 4, 5), r)]
-    helper(y, inputs, dtype, forward, backward, need_head_grads=False)
+    shape = {'l': (3, 4, 5), 'r': (3, 4, 5)}
+    check_function(y, forward, backward, shape=shape)


 def test_less():
@@ -343,13 +294,10 @@ def test_less():
        return np.less(l, r).astype("float32")

    def backward(head_grads, l, r):
-        return [np.zeros_like(l)]
+        return {'l': np.zeros_like(l)}

-
-    dtype = "float32"
-    inputs = [('l', (3, 4, 5), l),
-              ('r', (3, 4, 5), r)]
-    helper(y, inputs, dtype, forward, backward, need_head_grads=False)
+    shape = {'l': (3, 4, 5), 'r': (3, 4, 5)}
+    check_function(y, forward, backward, shape=shape)


 def test_reshape_like():
@@ -364,11 +312,8 @@ def test_reshape_like():
        return [np.reshape(head_grads, x.shape),
                np.zeros_like(y)]

-
-    dtype = "float32"
-    inputs = [('x', (3, 4, 5), x),
-              ('y', (5, 4, 3), y)]
-    helper(z, inputs, dtype, forward, backward)
+    shape = {'x': (3, 4, 5), 'y': (5, 4, 3)}
+    check_function(z, forward, backward, shape=shape)


 def verify_expand_like(in_shape, out_shape, axis, exclude):
@@ -412,10 +357,8 @@ def verify_expand_like(in_shape, out_shape, axis, exclude):
                np.zeros_like(y)]


-    dtype = "float32"
-    inputs = [('x', in_shape, x),
-              ('y', out_shape, y)]
-    helper(z, inputs, dtype, forward, backward, need_input=False)
+    shape = {'x': in_shape, 'y': out_shape}
+    check_function(z, forward, backward, shape=shape)


 def test_expand_like():
@@ -440,10 +383,8 @@ def verify_elemwise_sum(num_args):
    def backward(head_grads, **inputs):
        return [head_grads] * num_args

-    dtype = "float32"
-    inputs = [("input" + str(i), (3, 4, 5), s[i])
-              for i in range(num_args)]
-    helper(y, inputs, dtype, forward, backward, need_input=False)
+    shape = {s[i]: (3, 4, 5) for i in range(num_args)}
+    check_function(y, forward, backward, shape=shape)


 def test_elemwise_sum():
@@ -463,9 +404,9 @@ def test_block_grad():
        return [np.zeros_like(head_grads)]


-    dtype = "float32"
-    inputs = [('x', (3, 4, 5), x)]
-    helper(y, inputs, dtype, forward, backward, need_head_grads=False)
+    shape = {'x': (3, 4, 5)}
+    # Numerical grad checking would fail for this function
+    check_function(y, forward, backward, shape=shape, numerical_grads=False)


 def test_full():