Commit 12fa9148 by Tianqi Chen

[DOCS] Revamp the docs (#34)

parent cba957e0
# NNVM: Build deep learning system by parts
# NNVM: Graph IR Stack for Deep Learning Systems
[![Build Status](https://travis-ci.org/dmlc/nnvm.svg?branch=master)](https://travis-ci.org/dmlc/nnvm)
[![GitHub license](http://dmlc.github.io/img/apache2.svg)](./LICENSE)
NNVM is not a deep learning library. It is a modular,
decentralized and lightweight part to help build deep learning libraries.
NNVM is a reusable computational graph optimization and compilation stack for deep learning systems.
NNVM provides modules to:
## What is it
- Represent deep learning workloads from front-end frameworks via a graph IR.
- Optimize computation graphs to improve performance.
- Compile into executable modules and deploy to different hardware backends with minimum dependency.
While most deep learning systems offer end to end solutions,
it is interesting to assemble a deep learning system by parts.
The goal is to enable user to customize optimizations, target platforms and set of operators they care about.
We believe that the decentralized modular system is an interesting direction.
The hope is that effective parts can be assembled together just like you assemble your own desktops.
So the customized deep learning solution can be minimax, minimum in terms of dependencies,
while maximizing the users' need.
NNVM offers one such part, it provides a generic way to do
computation graph optimization such as memory reduction, device allocation and more
while being agnostic to the operator interface definition and how operators are executed.
NNVM is inspired by LLVM, aiming to be a high level intermediate representation library
for neural nets and computation graphs generation and optimizations.
See [Overview](docs/overview.md) for an introduction on what it provides.
## Example
See [TinyFlow](https://github.com/tqchen/tinyflow) on how you can build a TensorFlow API with NNVM and Torch.
## Why build learning system by parts
This is essentially ***Unix philosophy*** applied to machine learning system.
- Essential parts can be assembled in minimum way for embedding systems.
- Developers can hack the parts they need and compose with other well defined parts.
- Decentralized modules enable new extensions creators to own their project
without creating a monolithic version.
Deep learning system itself is not necessary one part, for example
here are some relative independent parts that can be isolated
- Computation graph definition, manipulation.
- Computation graph intermediate optimization.
- Computation graph execution.
- Operator kernel libraries.
- Imperative task scheduling and parallel task coordination.
We hope that there will be more modular parts in the future,
so system building can be fun and rewarding.
NNVM is designed to add new frontend, operators and graph optimizations in a decentralized fashion without changing the core interface. NNVM is part of [TVM stack](https://github.com/dmlc/tvm), which provides an end to end IR compilation stack for deploying deep learning workloads into different hardware backends
## Links
- [TinyFlow](https://github.com/tqchen/tinyflow) on how you can use NNVM to build a TensorFlow like API.
- [Apache MXNet](http://mxnet.io/) uses NNVM as a backend.
[MXNet](https://github.com/dmlc/mxnet) is moving to NNVM as its intermediate
representation layer for symbolic graphs.
NNVM Design Note
================
In this part of documentation, we share the rationale for the specific choices made when designing NNVM.
.. toctree::
:maxdepth: 2
overview
# NNVM Design Overview
NNVM is a reusable graph IR stack for deep learning systems. It provides useful API to construct, represent and transform computation graphs to get most high-level optimization needed in deep learning.
As a part of TVM stack for deep learning, NNVM also provides a shared compiler for deep learning frameworks to optimize, compile and deploy into different hardware backends via [TVM](https://github.com/dmlc/tvm)
## Key Requirements and Design Choices
- Have minimum dependency in the deployment module.
- Being able to add new operators to the IR, in a decentralized fashion.
- Being able to add new optimization passes to the IR and applies to existing graphs.
The item2 and 3 are particularly interesting if we compare it to a typical compiler IR. Compiler IR usually contains a fixed set of primitives(instructions), and use them as a contract between optimization pass designers. This design enables easy addition of new optimization passes, but not new operator(instruction). Because every time we add a new instruction, we need to modify the passes to accommodate these changes.
Deep learning frameworks usually have a fixed operator interface(schema). These interfaces can contain properties like shape inference function, whether in-place computation can happen. The operator interface is an again contract that makes it easy to add new an operator. But it is hard to add new passes in decentralized fashion a new optimization pass usually requires additional information, and this results in frequent changes of the centralized operator interface when we are exploring new optimizations. There is also a drawback of modularization. For example, a graph compiler for FPGA devices may not need the GPU device specific attributes.
During our explorations in graph optimization and compilation, we find that it is important to quickly add both operators and passes to the framework without changing the core library.
Here is a list of key elements in NNVM's design
- Operator registry system to register and add new operators
- Operator attribute system provide property of operator in decentralized fashion
- A reusable IR data structure for optimization passes.
The above list is more like the generic language part of NNVM, besides of that, we also provide a collection of core operator primitives, and graph optimization passes. The core tensor operator primitives and optimizations already cover commonly deep learning workloads. This design allows the NNVM compiler to be directly used as optimization and compilation stack for frameworks. The extendible nature of NNVM makes new adjustment easy without constraining the backend providers.
## Minimum Registration for a Symbolic Front-End
To use NNVM to build language front end, a developer only needs to register minimum information about each operator.
```c++
NNVM_REGISTER_OP(add)
.describe("add two data together")
.set_num_inputs(2);
NNVM_REGISTER_OP(conv2d)
.describe("take 2d convolution of input")
.set_num_inputs(2);
NNVM_REGISTER_OP(assign)
.describe("assign second input argument to the first one")
.set_num_inputs(2);
```
Compiling the code with NNVM library. User can use the following interface to compose the computation graph in python, like the following code.
```python
import nnvm.symbol as nn
# symbolic variable
x = nn.Variable('x')
y = nn.Variable('y')
w = nn.Variable('w')
z = nn.conv2d(nn.elemwise_add(x, y), w, kernel_size=(2,2), name='conv1')
```
The graph structure is interchangeable between the frontend and the backend. Python interface is supported currently. More language support can be easily
moved in the future.
## Operator Attribute for More Extensions
The minimum information provided by the operator is enough to get a front-end. However, we need more knowledge about each operator to do transformations and executing the graph.
A typical difference between neural nets' computation graph and traditional compiler IR is that there are a lot more high-level operators. We cannot fix the set of operators in the IR.
NNVM allow developers to register attributes of each operator. The attributes can include shape inference function, whether the operator can perform in-place calculation etc.
This design to having an operator attribute registry is not uncommon in deep learning systems.
For example, MXNet has a ```OpProperty``` class, Tensorflow has a ```OpDef``` and Caffe2 have a ```OperatorSchema``` class.
However, the operator attribute interface listed in these frameworks only support a fixed number of defined attributes of interest to the system. If we want to extend the framework to add a new attribute in each operator, we need to change the operator registry.
Eventually, the operator interface grows into to be very big and have to evolve in the centralized repo.
In NNVM, we decided to change the design and support arbitrary type of operator attributes, without changing the interface registry. The minimum interface also makes it easier to share across multiple projects
User can register new attribute, such as inplace property checking function as follows.
```c++
using FInplaceOption = std::function<
std::vector<std::pair<int, int> > (const NodeAttrs& attrs)>;
// we can register attributes from multiple places.
NNVM_REGISTER_OP(elemwise_add)
.set_num_inputs(2);
// register to tell first input can be calculate inplace with first output
NNVM_REGISTER_OP(add)
.set_attr<FInplaceOption>("FInplaceOption", [](const NodeAttrs& attrs) {
return std::vector<std::pair<int, int> >{{0, 0}};
});
NNVM_REGISTER_OP(exp)
.set_num_inputs(1)
.set_attr<FInplaceOption>("FInplaceOption", [](const NodeAttrs& attrs) {
return std::vector<std::pair<int, int> >{{0, 0}};
});
```
We can query these attributes at arbitrary parts of the code, like the following parts. Under the hood, each attribute is stored in a columnar store, that can easily be retrieved table and do quick lookups.
```c++
void MyFunction() {
const Op* add = Op::Get("add");
// if we need quick query, we can use static variable
// attribute map contains attributes of all operators.
static auto& finplace_option_map = Op::GetAttr<FInplaceOption>("FInplaceOption");
// quick look up attribute of add, O(1) time, vector index lookup internally.
auto add_inplace = finplace_option_tbl[add];
}
```
Besides making the code minimum, this attribute store enables decentralization of projects.
Before, all the attributes of operator have to sit on a centralized interface class.
Now, everyone can register attributes of their own, take some other attributes they need from another project without changing the operator interface and core library
## Graph and Pass
We can use the additional information on attribute registry to do optimizations and get more information about the graph. Graph is the unit we manipulate in these steps. A Graph in NNVM contains
two parts:
- The computation graph structure
- A attribute map from string to any type ```map<string, shared_ptr<any> >```
The second attribute map is quite important, as we may need different kinds
of information about the graph during the transformation process. Let it be
shapes of each tensor, types of each tensor or the storage allocation plans.
A ```Pass``` can take a graph with existing attribute information,
and transform it to the same graph structure with more graph attributes or another graph.
NNVM Documentation
==================
Welcome to NNVM documentation.
Contents
......@@ -9,4 +8,6 @@ Contents
.. toctree::
:maxdepth: 1
self
top
dev/index
# NNVM Overview
NNVM is a C++ library to help developers to build deep learning system.
It provides ways to construct, represent and transform computation graphs
invariant of how it is executed.
To begin with, let us start with a few stories to tell the design goals.
## Stories and Design Goals
X has built a new deep learning framework for image classification for fun,
with the modular tools like CuDNN and CUDA, it is not hard to assemble a C++ API.
However, most users like to use python/R/scala or other languages.
By registering the operators to NNVM, X can now get the graph composition
language front-end on these languages quickly without coding it up for
each type of language.
Y want to build a deep learning serving system on embedded devices.
To do that, we need to cut things off, as opposed to add new parts,
because codes such as gradient calculation multi-GPU scheduling is NOT relevant.
It is hard to build things from scratch as well, because we want to
reuse components such as memory optimization and kernel execution.
It is hard to do so in current frameworks because all these information
are tied to the operator interface. We want to be able to keep
certain part of the system we need and throw away other parts
to get the minimum system we need.
Z want to extend an existing deep learning system by adding a new feature,
say FPGA execution of some operators. To do so Z need to add a interface like ```FPGAKernel```
to the operators. E want to do another new feature that generate code for
certain subset of operations. Then interface like ```GenLLVMCode``` need to be added
to the operator. Eventually the system end up with a fat operator interface
in order to support everything (while everyone only want some part of it).
We can think more stories, as the deep learning landscape shifts to more devices
applications and scenarios. It is desirable to have different specialized
learning system to solve some problem well,
Here is a list of things we want:
- Minimum dependency
- Being able to assemble some part together while discarding some other parts
- No centralized operator interface but still allow user to provide various information about operators.
## Minimum Registration for a Symbolic Front-End
To use NNVM to build language front end, developer only need to register
minimum information about each operators.
```c++
NNVM_REGISTER_OP(add)
.describe("add two data together")
.set_num_inputs(2);
NNVM_REGISTER_OP(conv2d)
.describe("take 2d convolution of input")
.set_num_inputs(2);
NNVM_REGISTER_OP(assign)
.describe("assign second input argument to the first one")
.set_num_inputs(2);
```
Compiling the code with nnvm library. User can use the following interface
to compose the computation graph in python, like the following code.
```python
import nnvm.symbol as nn
# symbolic variable
x = nn.Variable('x')
y = nn.Variable('y')
w = nn.Variable('w')
z = nn.conv2d(nn.add(x, y), w, filter_size=(2,2), name='conv1')
```
The graph structure can be accessed in the backend. Currently python interface is supported.
But as NNVM follows the same C bridge API design as [MXNet](https://github.com/dmlc/mxnet),
which support many languages such as R/Julia/Scala/C++, more language support can be easily
moved in the future.
## Operator Attribute for More Extensions
While the minimum information provided by the operator is enough to get a front-end.
In order to do transformations and executing the graph. We need more information from each operator.
A typical difference between neural nets' computation graph and traditional LLVM IR is that
there are a lot more high level operators. We cannot fix the set of operators in the graph.
Instead developers are allowed to register attributes of operator. The attributes can include shape
inference function, whether the operator can be carried in-place etc.
This design to having an operator attribute registry is not uncommon in deep learning systems.
For example, MXNet has a ```OpProperty``` class, Tensorflow has a ```OpDef``` and Caffe2 have a ```OperatorSchema``` class.
However, the operator attribute interface listed in these frameworks only support a number of defined attributes of interest to the system.
For example, MXNet support inplace optimization decision, shape and type inference function.
If we want to extend the framework to add new type of attributes in each operator, we need to change the operator registry.
Eventually the operator interface become big and have to evolve in the centralized repo.
In NNVM, we decided to change the design and support arbitrary type of operator attributes,
without need to change the operator registry. This also echos the need of minimum interface
so that the code can be easier to share across multiple projects
User can register new attribute, such as inplace property checking function as follows.
```c++
using FInplaceOption = std::function<
std::vector<std::pair<int, int> > (const NodeAttrs& attrs)>;
// attributes can be registered from multiple places.
NNVM_REGISTER_OP(add)
.set_num_inputs(1);
// register to tell first input can be calculate inplace with first output
NNVM_REGISTER_OP(add)
.set_attr<FInplaceOption>("FInplaceOption", [](const NodeAttrs& attrs) {
return std::vector<std::pair<int, int> >{{0, 0}};
});
NNVM_REGISTER_OP(exp)
.set_num_inputs(1)
.set_attr<FInplaceOption>("FInplaceOption", [](const NodeAttrs& attrs) {
return std::vector<std::pair<int, int> >{{0, 0}};
});
```
These attributes can be queried at arbitrary parts of the code, like the following parts.
Under the hood, each attributes are stored in a any type columnar store,
that can easily be retrieved and cast back to typed table and do quick lookups.
```c++
void MyFunction() {
const Op* add = Op::Get("add");
// if we need quick query, we can use static variable
// attribute map contains attributes of all operators.
static auto& finplace_option_map = Op::GetAttr<FInplaceOption>("FInplaceOption");
// quick look up attribute of add, O(1) time, vector index lookup internally.
auto add_inplace = finplace_option_tbl[add];
}
```
Besides making the code minimum, this attribute store enables decentralization of projects.
Before, all the attributes of operator have to sit on a centralized interface class.
Now, everyone can register their own attribute, take some other attribute they need from another project
without need to change the operator interface.
See [example code](../example/src/operator.cc) on how operators can be registered.
## Graph and Pass
When we get more information about the operators.
We can use them to do optimizations and get more information about the graph.
Graph is the unit we manipulate in these steps. A Graph in NNVM contains
two parts:
- The computation graph structure
- A attribute map from string to any type ```map<string, shared_ptr<any> >```
The second attribute map is quite important, as we may need different kinds
of information about the graph during the transformation process. Let it be
shapes of each tensor, types of each tensor or the storage allocation plans.
A ```Pass``` can take a graph with existing attribute information,
and transform it to the same graph with more attributes, or another graph.
We have bunch of pass implemented in NNVM, including symbolic differentiation,
memory planning, shape/type inference and we can support more.
## Executing the Graph
Currently the library defined nothing on how the graph can be executed.
Execution is intentionally excluded from this module because we believe
that can be another module, and there can be many ways to execute one graph.
We can target different runtime platforms, or even write your own ones.
More importantly, the information such as memory allocation plan,
shape and type of each tensor can be used during execution phase
to enhance.
We can also register more runtime related information to the operator registry,
and define pass function to do runtime related optimization of the graph.
## Relation to LLVM
NNVM is inspired by LLVM. It is at a more high level, in a sense that there are a lot of optimization
chance we can have by knowing the high level information about the operator.
On the other hand, we do believe that code generation to LLVM can be a natural extension and can benefit some of the usecases.
## Unix Philosophy in Learning Systems
There are a few existing computation graph based deep learning frameworks (e.g. Theano, Tensorflow, Caffe2, MXNet etc.).
NNVM do not intend to become another one. Instead, NNVM summarizes a module that contains
- The graph representation is minimum, with no code dependency
- Operator attribute allow arbitrary information registered in unified way
- Invariant of execution layer to be re-targetable to multiple frontend and backend.
We believe this is the correct way for learning system.
By having more such modules, we can pick one ones we need, and remove the ones we do not want in our use cases.
Hopefully these effort can make deep learning system research and building easy, fun and rewarding.
NNVM Core Tensor Operators
==========================
This page contains the list of core tensor operator primitives re-defined in NNVM.
The core tensor operator primitives(``nnvm.top``) covers typical workloads in deep learning.
They can represent workloads in front-end frameworks, and provide basic building blocks for optimization.
Since deep learning is a fast evolving field and it is that possible to have operators that are not in here.
NNVM is designed for this problem and can easily new operators without changing the core library.
.. note::
Each operator node in the graph IR contains the following two kinds of parameters.
- inputs: positional list of input tensors
- attrs: attributes about operator(e.g. kernel_size in conv2d)
This document lists both inputs and attributes in the parameter field. You can distinguish them by the marked type. The inputs are of type Tensor, while the rest parameters are attributes.
To construct the graph with NNVM python API, a user can pass in the input Tensors as positional arguments, and attributes as keyword arguments.
Overview of Operators
---------------------
**Level 1: Basic Operators**
This level enables fully connected multi-layer perceptron.
.. autosummary::
......@@ -76,7 +96,8 @@ This level enables typical convnet models.
nnvm.symbol.broadcast_mul
nnvm.symbol.broadcast_div
Detailed Definitions
--------------------
.. autofunction:: nnvm.symbol.dense
.. autofunction:: nnvm.symbol.relu
.. autofunction:: nnvm.symbol.tanh
......
......@@ -138,12 +138,10 @@ def _make_atomic_symbol_function(handle, name):
doc_str = ('%s\n\n' +
'%s\n' +
'name : string, optional.\n' +
' Name of the resulting symbol.\n\n' +
'Returns\n' +
'-------\n' +
'symbol: Symbol\n' +
' The result symbol.')
'result: Tensor\n' +
' The result Tensor.')
doc_str = doc_str % (desc, param_str)
def creator(*args, **kwargs):
......
......@@ -122,12 +122,10 @@ cdef _make_atomic_symbol_function(OpHandle handle, string name):
func_name = py_str(name.c_str())
doc_str = ('%s\n\n' +
'%s\n' +
'name : string, optional.\n' +
' Name of the resulting symbol.\n\n' +
'Returns\n' +
'-------\n' +
'symbol: Symbol\n' +
' The result symbol.')
'result: Tensor\n' +
' The result Tensor.')
doc_str = doc_str % (desc, param_str)
func_hint = func_name.lower()
......
......@@ -59,4 +59,4 @@ def find_lib_path():
# current version
__version__ = "0.7.0"
__version__ = "0.8.0"
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment