Update theory.tex

153da1e2 · Xing · 56daeae5 · 153da1e2
Commit 153da1e2 authored Sep 10, 2020 by Xing
Hide whitespace changes
Inline Side-by-side

Showing with 1 additions and 1 deletions

AAAI2021/tex/theory.tex
+1 -1

No files found.
--- a/AAAI2021/tex/theory.tex
+++ b/AAAI2021/tex/theory.tex
@@ -83,7 +83,7 @@ Algorithm~\ref{al:learning}, we train the separate Speaker $S$ and Listener $L$ 
 Stochastic Policy Gradient methodology in a tick-tock manner, i.e, training one
 agent while keeping the other one. Roughly, when training the Speaker, the
 target is set to maximize the expected reward
-$J(\theta_S, \theta_L)=E_{\pi_S,\pi_L}[R(t, t^)]$ by adjusting the parameter
+$J(\theta_S, \theta_L)=E_{\pi_S,\pi_L}[R(t, \hat{t})]$ by adjusting the parameter
 $\theta_S$, where $\theta_S$ is the neural network parameters of Speaker $S$
 with learned output probability distribution $\pi_S$, and $\theta_L$ is the
 neural network parameters of Listener with learned probability distribution $\pi_L$.