Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
A
AAAI21_Emergent_language
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
haoyifan
AAAI21_Emergent_language
Commits
153da1e2
Commit
153da1e2
authored
Sep 10, 2020
by
Xing
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update theory.tex
parent
56daeae5
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 additions
and
1 deletions
+1
-1
AAAI2021/tex/theory.tex
+1
-1
No files found.
AAAI2021/tex/theory.tex
View file @
153da1e2
...
...
@@ -83,7 +83,7 @@ Algorithm~\ref{al:learning}, we train the separate Speaker $S$ and Listener $L$
Stochastic Policy Gradient methodology in a tick-tock manner, i.e, training one
agent while keeping the other one. Roughly, when training the Speaker, the
target is set to maximize the expected reward
$
J
(
\theta
_
S,
\theta
_
L
)=
E
_{
\pi
_
S,
\pi
_
L
}
[
R
(
t,
t
^
)]
$
by adjusting the parameter
$
J
(
\theta
_
S,
\theta
_
L
)=
E
_{
\pi
_
S,
\pi
_
L
}
[
R
(
t,
\hat
{
t
}
)]
$
by adjusting the parameter
$
\theta
_
S
$
, where
$
\theta
_
S
$
is the neural network parameters of Speaker
$
S
$
with learned output probability distribution
$
\pi
_
S
$
, and
$
\theta
_
L
$
is the
neural network parameters of Listener with learned probability distribution
$
\pi
_
L
$
.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment