Merge branch 'master' of http://62.234.201.16/hao/AAAI21_Emergent_language

8b694c7a · Zidong Du · db11311b · 153da1e2 · 8b694c7a
Commit 8b694c7a authored Sep 10, 2020 by Zidong Du
Hide whitespace changes
Inline Side-by-side

Showing with 1 additions and 1 deletions

AAAI2021/tex/theory.tex
+1 -1

No files found.
--- a/AAAI2021/tex/theory.tex
+++ b/AAAI2021/tex/theory.tex
@@ -83,7 +83,7 @@ Algorithm~\ref{al:learning}, we train the separate Speaker $S$ and Listener $L$ 
 Stochastic Policy Gradient methodology in a tick-tock manner, i.e, training one
 agent while keeping the other one. Roughly, when training the Speaker, the
 target is set to maximize the expected reward
-$J(\theta_S, \theta_L)=E_{\pi_S,\pi_L}[R(t, t^)]$ by adjusting the parameter
+$J(\theta_S, \theta_L)=E_{\pi_S,\pi_L}[R(t, \hat{t})]$ by adjusting the parameter
 $\theta_S$, where $\theta_S$ is the neural network parameters of Speaker $S$
 with learned output probability distribution $\pi_S$, and $\theta_L$ is the
 neural network parameters of Listener with learned probability distribution $\pi_L$.