Merge branch 'master' of http://62.234.201.16/hao/AAAI21_Emergent_language

hao

Merge branch 'master' of http://62.234.201.16/hao/AAAI21_Emergent_language
hao
61b69bcc · haoyifan · 4f5ef212 · 3c299656 · 61b69bcc
Commit 61b69bcc authored Sep 10, 2020 by haoyifan
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

AAAI2021/tex/theory.tex
+2 -2

No files found.
--- a/AAAI2021/tex/theory.tex
+++ b/AAAI2021/tex/theory.tex
@@ -122,9 +122,9 @@ use the predicted result $\hat{t}$ of the listener agent as the
 evidence of whether giving positive rewards. Then, the gradients of the
 expected reward $ J(\theta_S, \theta_L)$ can be calculated as follows:
 \begin{align}
-  \nabla_{\theta^S} J &= \mathbb{E}_{\pi^S_{old}, \pi^L} \left[ r(\hat{t}, t) \cdot
+  \nabla_{\theta^S} J &= \mathbb{E}_{\pi^S, \pi^L} \left[ r(\hat{t}, t) \cdot
     \frac{\nabla_{\theta^S}\pi^S(s_0, s_1 | t)}{\pi^S_{old}(s_0, s_1 | t)} \right] \\
-  \nabla_{\theta^L} J &= \mathbb{E}_{\pi^S, \pi^L_{old}} \left[ r(\hat{t}, t) \cdot
+  \nabla_{\theta^L} J &= \mathbb{E}_{\pi^S, \pi^L} \left[ r(\hat{t}, t) \cdot
    \frac{\nabla_{\theta^L} \pi^L(\hat{t} | s_0, s_1)}{\pi^L_{old}(\hat{t} | s_0, s_1)} \right]
 \end{align}