Commit 76bf3bb1 by Qi Guo

up

parents 26ccb660 65d62699
......@@ -173,11 +173,15 @@
inductions (e.g., small vocabulary sizes, carefully constructed distractors,
and ease-of-teaching) in multi-agent learning, which are unnatural.
Yet, few studies investigate the emergence of symbolic language with high
compositionality \emph{naturally}, i.e., without any deliberately handcrafted
compositionality \emph{naturally}, i.e., without deliberately handcrafted
inductions.
In this paper, we are the first to successfully achieve high compositional symbolic
<<<<<<< HEAD
language in a \emph{natural} manner without handcrafted inductions.
=======
language in a \emph{natural} manner.
>>>>>>> 65d6269911adf7729e1964b9c3faa0211fce0f3b
Initially, by thoroughly investigating the compositionality of emerged symbolic
language after removing the \emph{deliberately handcrafted}
inductions, we observe that the agent capacity plays a key role in
......
\section{Experiments}
\label{sec:exp}
\section{Experimental Setup}
\label{sec:thory}
In this section, we introduce the experimental setup used in this paper,
including the environment setup, agent architecture, and training algorithm.
\begin{figure}[t]
\centering
\includegraphics[width=0.9\columnwidth]{fig/occupy}
\caption{\rmk{The entire environment used in this paper.}}
\label{fig:game}
\end{figure}
\subsection{Environment setup}
\label{ssec:env}
Figure~\cite{fig:game} shows the entire environment used in this study,
i.e., a common used referential game. Roughly, the referential game requires the speaker and
listener working cooperatively to accomplish a certain task.
In this paper, the task is xxxx.
\textbf{Game rules} In our referential game, agents follow the following rules
to finish the game in a cooperatively manner. In each round,once received an
input object $t$, Speaker $S$ speaks a symbol sequence $s$ to Listener $L$ ;
Listener $L$ reconstruct the predict result $\hat{t}$ based on the listened
sequence $s$; if $t=\hat{t}$, agents win this game and receive positive rewards
($R(t,\hat{t})=1$); otherwise agents fail this game and receive negative rewards
($R(t,\hat{t})=-1$).
Precisely,
An input object t is a concept sequence with fixed length, denoted
$t=(c_0,c_1)$.
The concept $c_0(shape)$ and $c_1(color)$ are indicated as a
one-hot vector respectively.
The length of each one-hot vector ranges from 3 to 6.
These two vectors are concatenated to denote the input object t.
Each symbol sequence s contains two words, denoted $(s_0,s_1)$. Each word $s_i$
is chosen in the vocabulary set $V$. In this game, let the card $|V|$ range from
4 to 10, and the inequation $|V|^2\geq|M_1||M_1|$ is satisfied to ensure the
symbol sequence $(s_0,s_1)$ can be used to denote all the input object t. The
one-hot vector with the length $|V|$ is used to indicate the word $s_0$ and
$s_1$ respectively. Then, the two one-hot vectors are concatenated to denote the
symbol sequence s.
The predict result $\hat{t}$ is denoted as a one-hot vector with the length
$|M_0||M_1|$. Each bit of the one-hot vector denotes one input object. If the
predict result $\hat{t}[i*|M_1|+j]=1$, the one-hot vector of each predict
concept $\hat{c}_0$ and $\hat{c}_1$ respectively satisfied $\hat{c}_0[i]=1$ and
$\hat{c}_1[j]=1$.
If $(c_0,c_1)$ is equal to $(\hat{c}_0,\hat{c}_1)$, the input object and the
predict result indicate the same object.
\subsection{Agent architecture}
\label{ssec:agent}
\begin{figure}[t]
\centering
\includegraphics[width=0.9\columnwidth]{fig/occupy}
\caption{\rmk{The architecture of agents. \emph{Left:} speaker. \emph{Right:} listener.}}
\label{fig:agents}
\end{figure}
The agents apply their own policy to play the referential game. Denote the
policy of the speaker agent S and the listener L as $\pi_S$ and $\pi_L$. $\pi_S$
indicates the conditional probability $P(s_0|t)$ and $P(s_1|t)$. $\pi_L$
indicates the conditional probability $P(\hat{t}|s_0,s_1)$. The listener agent
output predict result $\hat{t}$ through random sampling on the conditional
probability $P(\hat{t}|s_0,s_1)$. The neural networks are used to simulate the
agent policy. The agent architecture is shown in Figure 1.
For the speaker, the input object t is firstly passed to a MLP to get a hidden
layer vector $h^S$. Then, the hidden layer vector is split into two feature
vectors $h_0^S$ and $h_1^S$ with length h\_size. Through a MLP and a softmax layer,
these feature vectors are transformed as the output $o_0$ and $o_1$ with the length
|V| respectively. Lastly, the symbol sequences $s_0$ and $s_1$ are sampled from the
output $o_0$ and $o_1$.
For the listener, the input symbol sequences $s_0$ and $s_1$ are passed into a MLP
respectively to get the hidden layer vectors $h_0$ and $h_1$. The length of each
vector is h\_size. Concatenating these vectors, and passing the conjunctive
vector into a MLP and a softmax layer, the output $o^L$ with length $|M_0||M_1|$
denotes $P(\hat{t}|s_0,s_1)$. Lastly, the predict result is sampled from the
output $o^L$.
In the experiments, the symbol h\_size is used to denote the model capacity of
the agents.
\subsection{Training algorithm}
\label{ssec:training}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment