Commit 65d62699 by Zidong Du

~

parent 4e45bed3
...@@ -173,11 +173,11 @@ ...@@ -173,11 +173,11 @@
inductions (e.g., small vocabulary sizes, carefully constructed distractors, inductions (e.g., small vocabulary sizes, carefully constructed distractors,
and ease-of-teaching) in multi-agent learning, which are unnatural. and ease-of-teaching) in multi-agent learning, which are unnatural.
Yet, few studies investigate the emergence of symbolic language with high Yet, few studies investigate the emergence of symbolic language with high
compositionality \emph{naturally}, i.e., without any deliberately handcrafted compositionality \emph{naturally}, i.e., without deliberately handcrafted
inductions. inductions.
In this paper, we are the first to successfully achieve high compositional symbolic In this paper, we are the first to successfully achieve high compositional symbolic
language in a purely \emph{natural} manner. language in a \emph{natural} manner.
Initially, by thoroughly investigating the compositionality of emerged symbolic Initially, by thoroughly investigating the compositionality of emerged symbolic
language after removing the \emph{deliberately handcrafted} language after removing the \emph{deliberately handcrafted}
inductions, we observe that the agent capacity plays a key role in inductions, we observe that the agent capacity plays a key role in
......
\section{Experiments}
\label{sec:exp}
\section{Experimental Setup}
\label{sec:thory}
In this section, we introduce the experimental setup used in this paper,
including the environment setup, agent architecture, and training algorithm.
\begin{figure}[t]
\centering
\includegraphics[width=0.9\columnwidth]{fig/occupy}
\caption{\rmk{The entire environment used in this paper.}}
\label{fig:game}
\end{figure}
\subsection{Environment setup}
\label{ssec:env}
Figure~\cite{fig:game} shows the entire environment used in this study,
i.e., a common used referential game. Roughly, the referential game requires the speaker and
listener working cooperatively to accomplish a certain task.
In this paper, the task is xxxx.
\textbf{Game rules} In our referential game, agents follow the following rules
to finish the game in a cooperatively manner. In each round,once received an
input object $t$, Speaker $S$ speaks a symbol sequence $s$ to Listener $L$ ;
Listener $L$ reconstruct the predict result $\hat{t}$ based on the listened
sequence $s$; if $t=\hat{t}$, agents win this game and receive positive rewards
($R(t,\hat{t})=1$); otherwise agents fail this game and receive negative rewards
($R(t,\hat{t})=-1$).
Precisely,
An input object t is a concept sequence with fixed length, denoted
$t=(c_0,c_1)$.
The concept $c_0(shape)$ and $c_1(color)$ are indicated as a
one-hot vector respectively.
The length of each one-hot vector ranges from 3 to 6.
These two vectors are concatenated to denote the input object t.
Each symbol sequence s contains two words, denoted $(s_0,s_1)$. Each word $s_i$
is chosen in the vocabulary set $V$. In this game, let the card $|V|$ range from
4 to 10, and the inequation $|V|^2\geq|M_1||M_1|$ is satisfied to ensure the
symbol sequence $(s_0,s_1)$ can be used to denote all the input object t. The
one-hot vector with the length $|V|$ is used to indicate the word $s_0$ and
$s_1$ respectively. Then, the two one-hot vectors are concatenated to denote the
symbol sequence s.
The predict result $\hat{t}$ is denoted as a one-hot vector with the length
$|M_0||M_1|$. Each bit of the one-hot vector denotes one input object. If the
predict result $\hat{t}[i*|M_1|+j]=1$, the one-hot vector of each predict
concept $\hat{c}_0$ and $\hat{c}_1$ respectively satisfied $\hat{c}_0[i]=1$ and
$\hat{c}_1[j]=1$.
If $(c_0,c_1)$ is equal to $(\hat{c}_0,\hat{c}_1)$, the input object and the
predict result indicate the same object.
\subsection{Agent architecture}
\label{ssec:agent}
\begin{figure}[t]
\centering
\includegraphics[width=0.9\columnwidth]{fig/occupy}
\caption{\rmk{The architecture of agents. \emph{Left:} speaker. \emph{Right:} listener.}}
\label{fig:agents}
\end{figure}
The agents apply their own policy to play the referential game. Denote the
policy of the speaker agent S and the listener L as $\pi_S$ and $\pi_L$. $\pi_S$
indicates the conditional probability $P(s_0|t)$ and $P(s_1|t)$. $\pi_L$
indicates the conditional probability $P(\hat{t}|s_0,s_1)$. The listener agent
output predict result $\hat{t}$ through random sampling on the conditional
probability $P(\hat{t}|s_0,s_1)$. The neural networks are used to simulate the
agent policy. The agent architecture is shown in Figure 1.
For the speaker, the input object t is firstly passed to a MLP to get a hidden
layer vector $h^S$. Then, the hidden layer vector is split into two feature
vectors $h_0^S$ and $h_1^S$ with length h\_size. Through a MLP and a softmax layer,
these feature vectors are transformed as the output $o_0$ and $o_1$ with the length
|V| respectively. Lastly, the symbol sequences $s_0$ and $s_1$ are sampled from the
output $o_0$ and $o_1$.
For the listener, the input symbol sequences $s_0$ and $s_1$ are passed into a MLP
respectively to get the hidden layer vectors $h_0$ and $h_1$. The length of each
vector is h\_size. Concatenating these vectors, and passing the conjunctive
vector into a MLP and a softmax layer, the output $o^L$ with length $|M_0||M_1|$
denotes $P(\hat{t}|s_0,s_1)$. Lastly, the predict result is sampled from the
output $o^L$.
In the experiments, the symbol h\_size is used to denote the model capacity of
the agents.
\subsection{Training algorithm}
\label{ssec:training}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment