~

65d62699 · Zidong Du · 4e45bed3 · 65d62699 · 65d62699 · 65d62699
Commit 65d62699 authored Sep 09, 2020 by Zidong Du
Hide whitespace changes
Inline Side-by-side

Showing with 99 additions and 2 deletions

AAAI2021/paper.tex
+2 -2

AAAI2021/tex/experiments.tex
+2 -0

AAAI2021/tex/theory.tex
+95 -0

No files found.
--- a/AAAI2021/paper.tex
+++ b/AAAI2021/paper.tex
@@ -173,11 +173,11 @@
  inductions (e.g., small vocabulary sizes, carefully constructed distractors, 
  and ease-of-teaching) in multi-agent learning, which are unnatural. 
  Yet, few studies investigate the emergence of symbolic language with high
-  compositionality \emph{naturally}, i.e., without any deliberately handcrafted
+  compositionality \emph{naturally}, i.e., without deliberately handcrafted
  inductions. 
  In this paper, we are the first to successfully achieve high compositional symbolic
-  language in a purely \emph{natural} manner.
+  language in a \emph{natural} manner.
  Initially, by thoroughly investigating the compositionality of emerged symbolic
  language after removing the \emph{deliberately handcrafted}
  inductions, we observe that the agent capacity plays a key role in

--- a/AAAI2021/tex/experiments.tex
+++ b/AAAI2021/tex/experiments.tex
+\section{Experiments}
+\label{sec:exp}
--- a/AAAI2021/tex/theory.tex
+++ b/AAAI2021/tex/theory.tex
+\section{Experimental Setup}
+\label{sec:thory}
+In this section, we introduce the experimental setup used in this paper,
+including the environment setup, agent architecture, and training algorithm.
+\begin{figure}[t]
+  \centering
+  \includegraphics[width=0.9\columnwidth]{fig/occupy}
+  \caption{\rmk{The entire environment used in this paper.}}
+  \label{fig:game}
+\end{figure}
+\subsection{Environment setup}
+\label{ssec:env}
+Figure~\cite{fig:game} shows the entire environment used in this study,
+i.e., a common used referential game. Roughly, the referential game requires the speaker and
+listener working cooperatively to accomplish a certain task. 
+In this paper, the task is xxxx.
+\textbf{Game rules} In our referential game, agents follow the following rules
+to finish the game in a cooperatively manner. In each round，once received an
+input object $t$, Speaker $S$ speaks a symbol sequence $s$ to Listener $L$ ;
+Listener $L$ reconstruct the predict result $\hat{t}$ based on the listened
+sequence $s$; if $t=\hat{t}$, agents win this game and receive positive rewards
+($R(t,\hat{t})=1$); otherwise agents fail this game and receive negative rewards
+($R(t,\hat{t})=-1$).
+Precisely, 
+An input object t is a concept sequence with fixed length, denoted
+$t=(c_0,c_1)$.
+The concept $c_0(shape)$ and $c_1(color)$ are indicated as a
+one-hot vector respectively.
+The length of each one-hot vector ranges from 3 to 6.
+These two vectors are concatenated to denote the input object t.
+Each symbol sequence s contains two words, denoted $(s_0,s_1)$. Each word $s_i$
+is chosen in the vocabulary set $V$. In this game, let the card $|V|$ range from
+4 to 10, and the inequation $|V|^2\geq|M_1||M_1|$ is satisfied to ensure the
+symbol sequence $(s_0,s_1)$ can be used to denote all the input object t. The
+one-hot vector with the length $|V|$ is used to indicate the word $s_0$ and
+$s_1$ respectively. Then, the two one-hot vectors are concatenated to denote the
+symbol sequence s.
+The predict result $\hat{t}$ is denoted as a one-hot vector with the length
+$|M_0||M_1|$. Each bit of the one-hot vector denotes one input object. If the
+predict result $\hat{t}[i*|M_1|+j]=1$, the one-hot vector of each predict
+concept $\hat{c}_0$ and $\hat{c}_1$ respectively satisfied $\hat{c}_0[i]=1$ and
+$\hat{c}_1[j]=1$.
+If $(c_0,c_1)$ is equal to $(\hat{c}_0,\hat{c}_1)$, the input object and the
+predict result indicate the same object.
+\subsection{Agent architecture}
+\label{ssec:agent}
+\begin{figure}[t]
+  \centering
+  \includegraphics[width=0.9\columnwidth]{fig/occupy}
+  \caption{\rmk{The architecture of agents. \emph{Left:} speaker. \emph{Right:} listener.}}
+  \label{fig:agents}
+\end{figure}
+The agents apply their own policy to play the referential game. Denote the
+policy of the speaker agent S and the listener L as $\pi_S$ and $\pi_L$. $\pi_S$
+indicates the conditional probability $P(s_0|t)$ and $P(s_1|t)$. $\pi_L$
+indicates the conditional probability $P(\hat{t}|s_0,s_1)$. The listener agent
+output predict result $\hat{t}$ through random sampling on the conditional
+probability $P(\hat{t}|s_0,s_1)$. The neural networks are used to simulate the
+agent policy. The agent architecture is shown in Figure 1.
+For the speaker, the input object t is firstly passed to a MLP to get a hidden
+layer vector $h^S$. Then, the hidden layer vector is split into two feature
+vectors $h_0^S$ and $h_1^S$ with length h\_size. Through a MLP and a softmax layer,
+these feature vectors are transformed as the output $o_0$ and $o_1$ with the length
+|V| respectively. Lastly, the symbol sequences $s_0$ and $s_1$ are sampled from the
+output $o_0$ and $o_1$.
+For the listener, the input symbol sequences $s_0$ and $s_1$ are passed into a MLP
+respectively to get the hidden layer vectors $h_0$ and $h_1$. The length of each
+vector is h\_size. Concatenating these vectors, and passing the conjunctive
+vector into a MLP and a softmax layer, the output $o^L$  with length $|M_0||M_1|$
+denotes $P(\hat{t}|s_0,s_1)$. Lastly, the predict result is sampled from the
+output $o^L$.
+In the experiments, the symbol h\_size is used to denote the model capacity of
+the agents.
+\subsection{Training algorithm}
+\label{ssec:training}