Commit ec4ea380 by Xing

Update theory.tex

parent 66b6f484
......@@ -78,7 +78,7 @@ symbols $\hat{t}$ with given input sequence $s$, i.e, $o^{L}=P(\hat{t}|s_0,s_1)$
To remove all the handcrafted induction as well as for a more realistic
scenario, agents for this referential game are independent of each other,
with no sharing model parameters or architectural connections. As shown in
with no shared model parameters or architectural connections. As shown in
Algorithm~\ref{al:learning}, we train the separate Speaker $S$ and Listener $L$ with
Stochastic Policy Gradient methodology in a tick-tock manner, i.e, training one
agent while keeping the other one. Roughly, when training the Speaker, the
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment