Commit 2c3e17b0 by YZhao
parents a0b98f35 130dd706
......@@ -2,14 +2,6 @@
%\section{Agent Capacity vs. Compositionality}
%\label{ssec:exp}
\begin{figure}[t]
\centering \includegraphics[width=0.99\columnwidth]{fig/Figure6_Compostionality_of_symbolic_language.pdf}
\caption{Compositionality of symbolic language under different parameters
($[\mu-\sigma,\mu+\sigma]$, where $\mu$ is the mean value and $\sigma$ is
the standard deviation).}
\label{fig:exp1}
\end{figure}
\begin{figure}[t]
\centering \includegraphics[width=0.99\columnwidth]{fig/Figure7_The_ratio_of_high_compositional_language.pdf}
......@@ -29,6 +21,7 @@
\begin{table}[b]
\centering
\small
\caption{The Chi-square test between high-compositionality and agent capacity.}
\label{tab:exp10}
\begin{tabular}{cccc}
......@@ -119,7 +112,6 @@ the high compositionality has statistical significance related to agent
capacity.
%\subsection{Breakdown}
%\label{ssec:language}
......
......@@ -48,10 +48,34 @@ vocabulary can express almost infinite concepts.}
\label{fig:induction}
\end{figure}
\begin{table*}[t]
\centering
\small
\caption{Handcrafted inductions in related works.}
\label{tab:rel}
\begin{tabular}{llllll}
\toprule
Works & Handcrafted induction & Compositionality\\
\midrule
\cite{kirby2015compression}&Expressivity and compressibility&Qualitative, Speaker\\
\cite{kottur-etal-2017-natural}&Listener's memory&Qualitative, Speaker\\
\cite{choi2018compositional}&Maximum message length&Qualitative, Speaker+Listener\\
\cite{lazaridou2018emergence}&Structure of input data&Quantitative, Speaker\\
\cite{evtimova2018emergent}&Multi-modal scenarios&Quantitative, Speaker\\
\cite{li2019ease}&Population size, resetting all listeners&Quantitative, Speaker\\
\cite{chaabouni-etal-2019-word}&Word-order constraints&Qualitative, Speaker\\
\cite{chaabouni2020compositionality}&Easier to decode&Quantitative, Speaker\\
\textbf{Ours} & \textbf{None} & \textbf{Quantitative, Speaker+Listener} \\
\bottomrule
\end{tabular}
\end{table*}
Prior studies focus on achieving high compositional symbolic language
through \emph{deliberately handcrafted} inductions, e.g., small vocabulary
sizes~\cite{}, memoryless~\cite{}, additional rewards~\cite{}, constructed loss functions~\cite{}, and
ease-of-teaching~\cite{}. \note{Such optimization methodologies are driven by the challenges to generate high compositional symbolic without induction in an existing multi-agent environment.}
Figure~\ref{fig:induction} reports the compositionality when training two agents
in the widely-used listener-speaker referential game for emerging 100 symbolic
languages, and it can be observed that \note{the compositionality
......@@ -180,6 +204,7 @@ can generate a higher compositional symbolic language with a higher probability.
%%\endsection
In this paper, we made the following contributions:
\begin{itemize}[topsep=0pt,itemsep=0cm]
\item To our best knowledge, we are the first work to successfully achieve
......
\section{Related works}
\label{sec:relatedwork}
\begin{table*}[b]
\centering
\small
\caption{Handcrafted inductions in related works.}
\label{tab:rel}
\begin{tabular}{llllll}
\toprule
Works & Handcrafted induction & Compositionality\\
\midrule
\cite{kirby2015compression}&Expressivity and compressibility&Qualitative, Speaker\\
\cite{kottur-etal-2017-natural}&Listener's memory&Qualitative, Speaker\\
\cite{choi2018compositional}&Maximum message length&Qualitative, Speaker+Listener\\
\cite{lazaridou2018emergence}&Structure of input data&Quantitative, Speaker\\
\cite{evtimova2018emergent}&Multi-modal scenarios&Quantitative, Speaker\\
\cite{li2019ease}&Population size, resetting all listeners&Quantitative, Speaker\\
\cite{chaabouni-etal-2019-word}&Word-order constraints&Qualitative, Speaker\\
\cite{chaabouni2020compositionality}&Easier to decode&Quantitative, Speaker\\
\textbf{Ours} & \textbf{None} & \textbf{Quantitative, Speaker+Listener} \\
\bottomrule
\end{tabular}
\end{table*}
%external environmental factors
Previous works focus on the external environmental factors that impact the
......@@ -34,8 +12,8 @@ For example, ~\citet{kirby2015compression} explored how the pressures for expres
~\citet{li2019ease} studied how the pressure, ease of teaching, impact on the iterative language of the population regime.
~\citet{evtimova2018emergent} designed a novel multi-modal scenarios, which the speaker and the listener should access to different modalities of the input object, to explore the language emergence.
Such factors are deliberately designed, which are too ideal to be true in
the real world. None of these works realizes the importance of model capacity of
agent itself.
the real world.
In this paper, these handcrafted inductions above are all removed, and the high compostional language is leaded only by the agent capacity.
......
\begin{figure}[h]
\section{ Symbolic Language Producing}
\label{sec:thory}
\begin{figure}[t]
\centering \includegraphics[width=\columnwidth]{fig/Figure2_The_referential_game_environment.pdf}
\caption{The referential game in this paper.}
\label{fig:game}
\end{figure}
\begin{figure*}[t]
\centering
\includegraphics[width=1.8\columnwidth]{fig/Figure3_The_architecture_of_agents.pdf}
\caption{The architecture of agents. \emph{Left:} speaker. \emph{Right:} listener.}
\label{fig:agents}
\end{figure*}
\section{ Symbolic Language Producing }
\label{sec:thory}
Before going to the detail of the training algorithms, we first introduce the environment, gaming rules, and agent architecture for enabling the emergence of symbolic language.
\begin{algorithm}[t]
\caption{Learning Algorithm$(t,\hat{t})$}
\label{al:learning}
\small
\begin{algorithmic}[1]
\IF{Training the speaker agent S}
\FOR{Batch T randomly selected from $M_0\times M_1$}
\FOR{$t=(c_0,c_1)$ in T}
\STATE $P(s_0|t),P(s_1|t)=\pi_{old}^S(s=(s_0,s_1)|t)$
\STATE Sample $s_0$ with $P(s_0|t)$, $s_1$ with $P(s_1|t)$
\STATE $P(\hat{t}|s) = \pi^L(\hat{t}|s)$
\STATE Sample $\hat{t}$ with $P(\hat{t}|s)$
\STATE Get reward $r(\hat{t},t)$
\STATE $J(\theta^S,\theta^L)=E_{\pi_{old}^S,\pi^L}[r(\hat{t},t)\cdot\frac{\pi^S(s|t)}{\pi^S_{old}(s|t)}]$
\STATE Update $\theta^S$ by $\bigtriangledown_{\theta^S}J$
\ENDFOR
\STATE $\pi_{old}^S\leftarrow \pi^S$
\ENDFOR
\ENDIF
\IF{Training the listener agent L}
\FOR{Batch T randomly selected from $M_0\times M_1$}
\FOR{$t=(c_0,c_1)$ in T}
\STATE $P(s_0|t),P(s_1|t)=\pi^S(s=(s_0,s_1)|t)$
\STATE Sample $s_0$ with $P(s_0|t)$, $s_1$ with $P(s_1|t)$
\STATE $P(\hat{t}|s) = \pi^L_{old}(\hat{t}|s)$
\STATE Sample $\hat{t}$ with $P(\hat{t}|s)$
\STATE Get reward $r(\hat{t},t)$
\STATE $J(\theta^S,\theta^L)=E_{\pi_{old}^S,\pi^L}[r(\hat{t},t)\cdot\frac{\pi^L(s|t)}{\pi^L_{old}(s|t)}]$
\STATE Update $\theta^L$ by $\bigtriangledown_{\theta^L}J$
\ENDFOR
\STATE $\pi_{old}^L\leftarrow \pi^L$
\ENDFOR
\ENDIF
\end{algorithmic}
\end{algorithm}
Before going to the detail of the training algorithms, we first introduce the environment, gaming rules, and agent architecture for enabling the emergence of symbolic language.
\subsection{Environment setup}
\label{ssec:env}
......@@ -43,12 +90,6 @@ Please note that since $t$ and $\hat{t}$ have different length, we say
$t=\hat{t}$ if $t$ expresses the same meaning as $\hat{t}$, e.g., ``red circle''.
\begin{figure*}[t]
\centering
\includegraphics[width=1.8\columnwidth]{fig/Figure3_The_architecture_of_agents.pdf}
\caption{The architecture of agents. \emph{Left:} speaker. \emph{Right:} listener.}
\label{fig:agents}
\end{figure*}
......@@ -104,40 +145,3 @@ expected reward $ J(\theta_S, \theta_L)$ can be calculated as follows:
\end{align}
\begin{algorithm}[t]
\caption{Learning Algorithm$(t,\hat{t})$}
\label{al:learning}
\small
\begin{algorithmic}[1]
\IF{Training the speaker agent S}
\FOR{Batch T randomly selected from $M_0\times M_1$}
\FOR{$t=(c_0,c_1)$ in T}
\STATE $P(s_0|t),P(s_1|t)=\pi_{old}^S(s=(s_0,s_1)|t)$
\STATE Sample $s_0$ with $P(s_0|t)$, $s_1$ with $P(s_1|t)$
\STATE $P(\hat{t}|s) = \pi^L(\hat{t}|s)$
\STATE Sample $\hat{t}$ with $P(\hat{t}|s)$
\STATE Get reward $r(\hat{t},t)$
\STATE $J(\theta^S,\theta^L)=E_{\pi_{old}^S,\pi^L}[r(\hat{t},t)\cdot\frac{\pi^S(s|t)}{\pi^S_{old}(s|t)}]$
\STATE Update $\theta^S$ by $\bigtriangledown_{\theta^S}J$
\ENDFOR
\STATE $\pi_{old}^S\leftarrow \pi^S$
\ENDFOR
\ENDIF
\IF{Training the listener agent L}
\FOR{Batch T randomly selected from $M_0\times M_1$}
\FOR{$t=(c_0,c_1)$ in T}
\STATE $P(s_0|t),P(s_1|t)=\pi^S(s=(s_0,s_1)|t)$
\STATE Sample $s_0$ with $P(s_0|t)$, $s_1$ with $P(s_1|t)$
\STATE $P(\hat{t}|s) = \pi^L_{old}(\hat{t}|s)$
\STATE Sample $\hat{t}$ with $P(\hat{t}|s)$
\STATE Get reward $r(\hat{t},t)$
\STATE $J(\theta^S,\theta^L)=E_{\pi_{old}^S,\pi^L}[r(\hat{t},t)\cdot\frac{\pi^L(s|t)}{\pi^L_{old}(s|t)}]$
\STATE Update $\theta^L$ by $\bigtriangledown_{\theta^L}J$
\ENDFOR
\STATE $\pi_{old}^L\leftarrow \pi^L$
\ENDFOR
\ENDIF
\end{algorithmic}
\end{algorithm}
......@@ -11,6 +11,13 @@ MIS is the similarity between an identity matrix and the mutual information matr
\end{figure}
\begin{figure}[t]
\centering
\includegraphics[width=0.8\columnwidth]{fig/Figure5_An_emergent_language.pdf}
\caption{An emergent language that the unilateral metrics cannot measure its non-compositionality. Notice that given $s_1 = \mathrm{a}$, the listener can neither determine the shape nor the color without the knowledge about $s_0$.}
\label{fig:unilateral}
\end{figure}
Before giving the definition of MIS, we first model the agents in the referential games. As shown in Figure~\ref{fig:modeling}, the listener and speaker in the referential game are connected in tandem. The speaker agent can be regard as a channel, whose input is a concept $c = (c_0, c_1)$ and output is a symbol $s = (s_0, s_1)$. The listener agent can be regard as another channel, whose input is a symbol $s = (s_0, s_1)$ and output is a predict result $\hat{t} = (\hat{c}_0, \hat{c}_1)$. Since the output of the listener only depends on the symbol $s$, we can model the policy of the speaker agent and the listener agent by the probability distribution $P(s = (s_0, s_1) | t = (c_0, c_1))$ and $P(\hat{t} = (\hat{c}_0, \hat{c}_1) | s_0, s_1)$, respectively.
Now we can analyse the information of the concepts preserved in the transmission process given the symbol transmitted, i.e. the conditional mutual information $I\left(t,\hat{t}|s\right)$. Whenever a stable language emerged, the speaker and the listener consistently use a specific symbol $s$ to refer to a specific object $t$. Therefore we can safely say $I\left(t,\hat{t}|s\right) = I\left(t,\hat{t}|s_{t,\hat{t}}\right)$ where $s_{t,\hat{t}}=\max_s\left\{P\left(\hat{t}|s\right)P\left(s|t\right)\right\}$. This conditional mutual information can be obtained by Equation~\ref{eq:cmi}.
......@@ -33,15 +40,16 @@ R\left(c_0,s_0\right) & R\left(c_0,s_0\right)
\end{equation}
Each column of $M$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $M$ and a one-hot vector is align with the compositionality of the emergent language.
\begin{figure}[t]
\centering
\includegraphics[width=0.8\columnwidth]{fig/Figure5_An_emergent_language.pdf}
\caption{An emergent language that the unilateral metrics cannot measure its non-compositionality. Notice that given $s_1 = \mathrm{a}$, the listener can neither determine the shape nor the color without the knowledge about $s_0$.}
\label{fig:unilateral}
\centering \includegraphics[width=0.99\columnwidth]{fig/Figure6_Compostionality_of_symbolic_language.pdf}
\caption{Compositionality of symbolic language under different parameters
($[\mu-\sigma,\mu+\sigma]$, where $\mu$ is the mean value and $\sigma$ is
the standard deviation).}
\label{fig:exp1}
\end{figure}
Finally, we define \emph{raw mutual information similarity} ($\mathit{MIS}_0$)
as the average cosine similarity of $M$ columns and one-hot vectors, as
Equation~\ref{eq:mis2}. Furthermore, $\mathit{MIS}$ is the normalized mutual
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment