Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
A
AAAI21_Emergent_language
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
haoyifan
AAAI21_Emergent_language
Commits
2ac2b97f
Commit
2ac2b97f
authored
Sep 10, 2020
by
Ruizhi Chen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修改公式1,2
parent
916cb6a8
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
5 deletions
+4
-5
AAAI2021/tex/theory.tex
+4
-5
No files found.
AAAI2021/tex/theory.tex
View file @
2ac2b97f
\section
{
Symbolic Language Producing
}
\section
{
Symbolic Language Producing
}
\label
{
sec:thory
}
\label
{
sec:thory
}
...
@@ -121,10 +120,10 @@ use the predicted result $\hat{t}$ of the listener agent as the
...
@@ -121,10 +120,10 @@ use the predicted result $\hat{t}$ of the listener agent as the
evidence of whether giving positive rewards. Then, the gradients of the
evidence of whether giving positive rewards. Then, the gradients of the
expected reward
$
J
(
\theta
_
S,
\theta
_
L
)
$
can be calculated as follows:
expected reward
$
J
(
\theta
_
S,
\theta
_
L
)
$
can be calculated as follows:
\begin{align}
\begin{align}
\nabla
_{
\theta
^
S
}
J
&
=
\mathbb
{
E
}_{
\pi
^
S,
\pi
^
L
}
\left
[ r(
\hat
{
t
}
, t)
\cdot
\nabla
_{
\theta
^
S
}
J
&
=
\mathbb
{
E
}_{
\pi
^
S
_{
old
}
,
\pi
^
L
}
\left
[ r(
\hat
{
t
}
, t)
\cdot
\nabla
_{
\theta
^
S
}
\log
{
\pi
^
S
(s
_
0, s
_
1 | t)
}
\right
]
\\
\frac
{
\nabla
_{
\theta
^
S
}
\pi
^
S(s
_
0, s
_
1 | t)
}{
\pi
^
S
_{
old
}
(s
_
0, s
_
1 | t)
}
\right
]
\\
\nabla
_{
\theta
^
L
}
J
&
=
\mathbb
{
E
}_{
\pi
^
S,
\pi
^
L
}
\left
[ r(
\hat
{
t
}
, t)
\cdot
\nabla
_{
\theta
^
L
}
J
&
=
\mathbb
{
E
}_{
\pi
^
S,
\pi
^
L
_{
old
}
}
\left
[ r(
\hat
{
t
}
, t)
\cdot
\
nabla
_{
\theta
^
L
}
\log
{
\pi
^
S
(
\hat
{
t
}
| s
_
0, s
_
1)
}
\right
]
\
frac
{
\nabla
_{
\theta
^
L
}
\pi
^
L(
\hat
{
t
}
| s
_
0, s
_
1)
}{
\pi
^
L
_{
old
}
(
\hat
{
t
}
| s
_
0, s
_
1)
}
\right
]
\end{align}
\end{align}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment