Previous works focus on the external environmental factors that impact the
compositionality of emerged symbolic language.
For example, ~\citet{kirby2015compression} explored how the pressures for expressivity and compressibility lead the structured language.
~\citet{kottur-etal-2017-natural} constrained the vocabulary size and whether the listener has memory to coax the compositionality of the emergent language.
~\citet{lazaridou2018emergence} showed that the degree of structure found in the input data affects the emergence of the symbolic language.
~\citet{li2019ease} studied how the pressure, ease of teaching, impact on the iterative language of the population regime.
~\citet{evtimova2018emergent} designed a novel multi-modal scenarios, which the speaker and the listener should access to different modalities of the input object, to explore the language emergence.
Such factors are deliberately designed, which are too ideal to be true in
the real world. None of these works realizes the importance of model capacity of
agent itself. \rmk{this should be largely emphasized.}
\begin{table*}[htbp]
\begin{table*}[h]
\centering
\small
\caption{Handcrafted inductions in related works.}
...
...
@@ -35,6 +20,24 @@ agent itself. \rmk{this should be largely emphasized.}
\end{tabular}
\end{table*}
\section{Related works}
\label{sec:relatedwork}
%external environmental factors
Previous works focus on the external environmental factors that impact the
compositionality of emerged symbolic language.
For example, ~\citet{kirby2015compression} explored how the pressures for expressivity and compressibility lead the structured language.
~\citet{kottur-etal-2017-natural} constrained the vocabulary size and whether the listener has memory to coax the compositionality of the emergent language.
~\citet{lazaridou2018emergence} showed that the degree of structure found in the input data affects the emergence of the symbolic language.
~\citet{li2019ease} studied how the pressure, ease of teaching, impact on the iterative language of the population regime.
~\citet{evtimova2018emergent} designed a novel multi-modal scenarios, which the speaker and the listener should access to different modalities of the input object, to explore the language emergence.
Such factors are deliberately designed, which are too ideal to be true in
the real world. None of these works realizes the importance of model capacity of
agent itself. \rmk{this should be largely emphasized.}
%measure
To measure the compositionality of emerged symbolic language, many metrics are
Each column of $M$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $M$ and a one-hot vector is align with the compositionality of the emergent language.
\caption{An emergent language that the unilateral metrics cannot measure its non-compositionality. Notice that when $s_1=\mathrm{a}$, the listener can neither determine the shape nor the color without the knowledge about $s_0$.}
\label{fig:unilateral}
\end{figure}
Finally, we define \emph{raw mutual information similarity} ($MIS_0$)
as the average cosine similarity of $M$ columns and one-hot vectors, as
Equation~\ref{eq:mis2}. Furthermore, $MIS$ is the normalized raw mutual
\caption{An emergent language that the unilateral metrics cannot measure its non-compositionality. Notice that when $s_1=\mathrm{a}$, the listener can neither determine the shape nor the color without the knowledge about $s_0$.}
\label{fig:unilateral}
\end{figure}
MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent shape. From the perspective of speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defect by taking the policy of the listener into account, thus $MIS < 1$.