Commit 14acb80a by Steven S. Lyubomirsky Committed by Tianqi Chen

[Relay][docs] Details on comp. graphs in Relay dev intro (#2324)

parent 1b2e553b
...@@ -6,48 +6,51 @@ framework developers who are familiar with the computational graph representatio ...@@ -6,48 +6,51 @@ framework developers who are familiar with the computational graph representatio
We briefly summarize the design goal here, and will touch upon these points in the later part of the article. We briefly summarize the design goal here, and will touch upon these points in the later part of the article.
- Support traditional data flow style programming and transformations. - Support traditional data flow-style programming and transformations.
- Support functional style scoping, let-binding and making it fully featured differentiable language. - Support functional-style scoping, let-binding and making it a fully featured differentiable language.
- Being able to allow the user to mix the two programming styles. - Being able to allow the user to mix the two programming styles.
Build Computational Graph with Relay Build a Computational Graph with Relay
------------------------------------ --------------------------------------
Traditional deep learning frameworks use computational graphs as their intermediate representation. Traditional deep learning frameworks use computational graphs as their intermediate representation.
A computational graph (or data-flow graph), is a directed acyclic graph (DAG) that represents the computation. A computational graph (or dataflow graph), is a directed acyclic graph (DAG) that represents the computation.
Though dataflow graphs are limited in terms of the computations they are capable of expressing due to
lacking control flow, their simplicity makes it easier to implement automatic differentiation and
compile for heterogeneous execution environments (e.g., executing parts of the graph on specialized hardware).
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow.png
:align: center :align: center
:scale: 70% :scale: 70%
You can use Relay to build a computational(dataflow) graph. Specifically, the above code shows how to You can use Relay to build a computational (dataflow) graph. Specifically, the above code shows how to
construct a simple two-node graph. You can find that the syntax of the example is not that different from existing construct a simple two-node graph. You can find that the syntax of the example is not that different from existing
computational graph IR like NNVMv1, with the only difference in terms of terminology: computational graph IR like NNVMv1, with the only difference in terms of terminology:
- Existing frameworks usually use graph and subgraph - Existing frameworks usually use graph and subgraph
- Relay uses function e.g. -- ``fn (%x)``, to indicate the graph - Relay uses function e.g. -- ``fn (%x)``, to indicate the graph
Each data-flow node is a CallNode in Relay. The relay python DSL allows you to construct a data-flow quickly. Each dataflow node is a CallNode in Relay. The Relay Python DSL allows you to construct a dataflow graph quickly.
One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with One thing we want to highlight in the above code -- is that we explicitly constructed an Add node with
both input point to ``%1``. When a deep learning framework evaluates the above program, it will compute both input point to ``%1``. When a deep learning framework evaluates the above program, it will compute
the nodes in topological order, and ``%1`` will only be computed once. the nodes in topological order, and ``%1`` will only be computed once.
While this fact is very natural to deep learning framework builders, it is something that might While this fact is very natural to deep learning framework builders, it is something that might
surprise a PL folk in the first place. If we implement a simple visitor to print out the result and surprise a PL researcher in the first place. If we implement a simple visitor to print out the result and
treat the result as nested Call expression, it becomes ``log(%x) + log(%x)``. treat the result as nested Call expression, it becomes ``log(%x) + log(%x)``.
Such ambiguity is caused by different interpretation of program semantics when there is a shared node in the DAG. Such ambiguity is caused by different interpretations of program semantics when there is a shared node in the DAG.
In a normal functional programming IR, nested expressions are treated as expression trees, without considering the In a normal functional programming IR, nested expressions are treated as expression trees, without considering the
fact that the ``%1`` is actually reused twice in ``%2``. fact that the ``%1`` is actually reused twice in ``%2``.
Relay IR choose to be mindful of this difference. Usually, deep learning framework users build the computational The Relay IR is mindful of this difference. Usually, deep learning framework users build the computational
graph in this fashion, where a DAG node reuse often occur. As a result, when we print out the Relay program in graph in this fashion, where a DAG node reuse often occurs. As a result, when we print out the Relay program in
the text format, we print one CallNode per line and assign a temporary id ``(%1, %2)`` to each CallNode so each common the text format, we print one CallNode per line and assign a temporary id ``(%1, %2)`` to each CallNode so each common
node can be referenced in later parts of the program. node can be referenced in later parts of the program.
Module: Support Multiple Functions(Graphs) Module: Support Multiple Functions (Graphs)
------------------------------------------ -------------------------------------------
So far we have introduced how can we build a data flow graph as a function. One might naturally ask -- can we support multiple So far we have introduced how can we build a dataflow graph as a function. One might naturally ask: Can we support multiple
functions and enable them to call each other. Relay allows grouping multiple functions together in a module, the code below functions and enable them to call each other? Relay allows grouping multiple functions together in a module; the code below
shows an example of a function calling another function. shows an example of a function calling another function.
.. code:: .. code::
...@@ -90,7 +93,7 @@ At this point, we have introduced the basic concepts in Relay. Notably, Relay ha ...@@ -90,7 +93,7 @@ At this point, we have introduced the basic concepts in Relay. Notably, Relay ha
- Succinct text format that eases debugging of writing passes. - Succinct text format that eases debugging of writing passes.
- First-class support for subgraphs-functions, in a joint module, this enables further chance of joint optimizations such as inlining and calling convention specification. - First-class support for subgraphs-functions, in a joint module, this enables further chance of joint optimizations such as inlining and calling convention specification.
- Naive front-end language interop, for example, all the data structure can be visited in python, which allows quick prototyping of optimizations in python and mixing them with c++ code. - Naive front-end language interop, for example, all the data structure can be visited in Python, which allows quick prototyping of optimizations in Python and mixing them with C++ code.
Let Binding and Scopes Let Binding and Scopes
...@@ -99,11 +102,11 @@ Let Binding and Scopes ...@@ -99,11 +102,11 @@ Let Binding and Scopes
So far, we have introduced how to build a computational graph in the good old way used in deep learning frameworks. So far, we have introduced how to build a computational graph in the good old way used in deep learning frameworks.
This section will talk about a new important construct introduced by Relay -- let bindings. This section will talk about a new important construct introduced by Relay -- let bindings.
Let binding is used in every high-level programming languages. In Relay, it is a data structure with three Let binding is used in every high-level programming language. In Relay, it is a data structure with three
fields ``Let(var, value, body)``. When we evaluate a let expression, we first evaluate the value part, assign fields ``Let(var, value, body)``. When we evaluate a let expression, we first evaluate the value part, assign
it to the var, then return the evaluated result in the body expression. it to the var, then return the evaluated result in the body expression.
You can use a sequence of let bindings to construct a logically equivalent program to a data-flow program. You can use a sequence of let bindings to construct a logically equivalent program to a dataflow program.
The code example below shows one program with two forms side by side. The code example below shows one program with two forms side by side.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/dataflow_vs_func.png
...@@ -111,26 +114,25 @@ The code example below shows one program with two forms side by side. ...@@ -111,26 +114,25 @@ The code example below shows one program with two forms side by side.
:scale: 70% :scale: 70%
The nested let-binding is called A-normal form, and it is commonly used as IRs in functional programming languages. The nested let binding is called A-normal form, and it is commonly used as IRs in functional programming languages.
Now, please take a close look at the AST structure. While the two programs are semantically identical Now, please take a close look at the AST structure. While the two programs are semantically identical
(so are their textual representations, except that A-normal form has let prefix), their AST structures are different from each other. (so are their textual representations, except that A-normal form has let prefix), their AST structures are different.
Since program optimizations take these AST data structures and transform them, the two different structure will Since program optimizations take these AST data structures and transform them, the two different structures will
affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``: affect the compiler code we are going to write. For example, if we want to detect a pattern ``add(log(x), y)``:
- In the data-flow form, we can first access the add node, then directly look at its first argument to see if it is a log - In the data-flow form, we can first access the add node, then directly look at its first argument to see if it is a log
- In the A-normal form, we cannot directly do the check anymore, because the first input to add is ``%v1`` -- we will need to keep a map from variable to its bound values and lookup that map, in order to know that ``%v1`` is a log. - In the A-normal form, we cannot directly do the check anymore, because the first input to add is ``%v1`` -- we will need to keep a map from variable to its bound values and look up that map, in order to know that ``%v1`` is a log.
Different data structures will impact how you might write transformations, and we need to keep that in mind. Different data structures will impact how you might write transformations, and we need to keep that in mind.
So now, as a deep learning framework developer, you might ask, why do we need let-binding. So now, as a deep learning framework developer, you might ask, Why do we need let bindings?
Your PL friends will always tell you that let is important -- as PL is a quite established field, Your PL friends will always tell you that let is important -- as PL is a quite established field,
there must be some wisdom behind that. there must be some wisdom behind that.
Why We Might Need Let Binding Why We Might Need Let Binding
----------------------------- -----------------------------
One key usage of let binding is that it specifies the scope of computation. Let us take look at the following example, One key usage of let binding is that it specifies the scope of computation. Let us take a look at the following example,
which does not use let binding. which does not use let bindings.
.. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png .. image:: https://raw.githubusercontent.com/tvmai/tvmai.github.io/master/images/relay/let_scope.png
:align: center :align: center
...@@ -141,7 +143,7 @@ to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as ...@@ -141,7 +143,7 @@ to suggest that we should evaluate node ``%1`` outside the if scope, the AST(as
Actually, a dataflow graph never defines its scope of the evaluation. This introduces some ambiguity in the semantics. Actually, a dataflow graph never defines its scope of the evaluation. This introduces some ambiguity in the semantics.
This ambiguity becomes more interesting when we have closures. Consider the following program, which returns a closure. This ambiguity becomes more interesting when we have closures. Consider the following program, which returns a closure.
We don’t know where should we compute ``%1``. It can either be outside the closure, or inside the closure. We don’t know where should we compute ``%1``; it can be either inside or outside the closure.
.. code:: .. code::
...@@ -153,18 +155,18 @@ We don’t know where should we compute ``%1``. It can either be outside the clo ...@@ -153,18 +155,18 @@ We don’t know where should we compute ``%1``. It can either be outside the clo
%2 %2
} }
Let binding solves this problem, as the computation of the value happens at the let node. In both programs, A let binding solves this problem, as the computation of the value happens at the let node. In both programs,
if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specify the computation location to if we change ``%1 = log(%x)`` to ``let %v1 = log(%x)``, we clearly specify the computation location to
be outside of the if scope and closure. As you can see let-binding gives a more precise specification of the computation site be outside of the if scope and closure. As you can see let-binding gives a more precise specification of the computation site
and could be useful when we generate backend code(as such specification is in the IR). and could be useful when we generate backend code (as such specification is in the IR).
On the other hand, the data-flow form, which does not specify the scope of computation, does have its own advantages On the other hand, the dataflow form, which does not specify the scope of computation, does have its own advantages
-- we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom -- namely, we don’t need to worry about where to put the let when we generate the code. The dataflow form also gives more freedom
to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow to the later passes to decide where to put the evaluation point. As a result, it might not be a bad idea to use data flow
form of the program in the initial phases of optimizations when you find it is convenient. form of the program in the initial phases of optimizations when you find it is convenient.
Many optimizations in Relay today are written to optimize dataflow programs. Many optimizations in Relay today are written to optimize dataflow programs.
However, when we lower the IR to actual runtime program, we need to be precise about the scope of computation. However, when we lower the IR to an actual runtime program, we need to be precise about the scope of computation.
In particular, we want to explicitly specify where the scope of computation should happen when we are using In particular, we want to explicitly specify where the scope of computation should happen when we are using
sub-functions and closures. Let-binding can be used to solve this problem in later stage execution specific optimizations. sub-functions and closures. Let-binding can be used to solve this problem in later stage execution specific optimizations.
...@@ -176,13 +178,13 @@ Hopefully, by now you are familiar with the two kinds of representations. ...@@ -176,13 +178,13 @@ Hopefully, by now you are familiar with the two kinds of representations.
Most functional programming languages do their analysis in A-normal form, Most functional programming languages do their analysis in A-normal form,
where the analyzer does not need to be mindful that the expressions are DAGs. where the analyzer does not need to be mindful that the expressions are DAGs.
Relay choose to support both the data-flow form and let binding. We believe that it is important to let the Relay choose to support both the dataflow form and let bindings. We believe that it is important to let the
framework developer choose the representation they are familiar with. framework developer choose the representation they are familiar with.
This does, however, have some implications on how we write passes: This does, however, have some implications on how we write passes:
- If you come from a data-flow background and want to handle let, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expr -> transformed expression anyway. Note that this will effectively remove all the let in the program. - If you come from a dataflow background and want to handle lets, keep a map of var to the expressions so you can perform lookup when encountering a var. This likely means a minimum change as we already need a map from expressions to transformed expressions anyway. Note that this will effectively remove all the lets in the program.
- If you come from a PL background and like A-normal form, we will provide a dataflow -> A-normal form pass. - If you come from a PL background and like A-normal form, we will provide a dataflow to A-normal form pass.
- For PL folks, when you are implementing something (like dataflow->ANF transformation), be mindful that the expression can be DAG, and this usually means that we should visit expressions with a ``Map<Expr, Result>`` and only compute the transformed result once, so the result expression keeps the common structure. - For PL folks, when you are implementing something (like a dataflow-to-ANF transformation), be mindful that expressions can be DAGs, and this usually means that we should visit expressions with a ``Map<Expr, Result>`` and only compute the transformed result once, so the resulting expression keeps the common structure.
There are additional advanced concepts such as symbolic shape inference, polymorphic functions There are additional advanced concepts such as symbolic shape inference, polymorphic functions
that are not covered by this material, you are more than welcomed to look at other materials. that are not covered by this material; you are more than welcome to look at other materials.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment