Commit f2359c95 by Zidong Du
parents 28b78e16 fa9c30d7
......@@ -164,6 +164,9 @@
firstAuthor@affiliation1.com, secondAuthor@affilation2.com, thirdAuthor@affiliation1.com
}
\fi
\DeclareMathOperator*{\argmax}{arg\,max}
\begin{document}
\maketitle
......@@ -219,10 +222,10 @@
\end{abstract}
\input{tex/introduction.tex}
\input{tex/relatedwork.tex}
\input{tex/theory.tex}
\input{tex/theory2.tex}
\input{tex/experiments.tex}
\input{tex/last.tex}
......
......@@ -17,11 +17,17 @@ the mean value of MIS decreases as the value of $h_{size}$ increases. Taking the
configuration of vocabulary size $|V|=10$ as an example, the mean value of MIS
is around 0.8 when $h_{size}\le 20$; MIS significantly decreases to 0.75 when
$h_{size}$ increases from 20 to 40; MIS further reduces to 0.7 when $h_{size}$
increases from 40 to 100. For different vocabulary sizes, the MIS shares the
similar behaviour. In summary, lower agent capacity improves the possibility of
increases from 40 to 100.
For different vocabulary sizes, the MIS shares the
similar behaviour.
It is because symbols in low-compositional languages carry semantic information
about more concepts. As a result, higher capacity is required to characterize the
complex semantic information for low-compositional language to emerge.
In summary, lower agent capacity improves the possibility of
emerging high compositional symbolic language.
\begin{figure}[t]
\centering
\includegraphics[width=0.9\columnwidth]{fig/occupy}
......
......@@ -3,38 +3,46 @@
In this section, we propose the \emph{Mutual Information Similarity (MIS)} as a metric of compositionality, and give a thorough theoretical analyse.
MIS is the similarity between an identity matrix and the mutual information matrix of concepts and symbols.
Before giving the definition of MIS, we first model the agents in the referential games. As shown in Figure~\ref{}, the listener and speaker in the referential game are connected in tandem. The speaker agent $S$ can be regard as a channel, whose input is a concept $c = ($c_0, c_1$) and output is a symbol $s = ($s_0, s_1$). The listener agent $L$ can be regard as another channel, whose input is a symbol $s = ($s_0, s_1$) and output is a predict result $\hat{t} = (\hat{c}_0, \hat{c}_1)$. Since the output of $L$ only depends on the symbol $s$, we can model the policy of the speaker agent and the listener agent by the probability distribution $P(s = (s_0, s_1) | t = (c_0, c_1))$ and $P(\hat{t} = (\hat{c}_0, \hat{c}_1) | s_0, s_1)$, respectively.
\begin{figure}[t]
\centering
\includegraphics[width=0.9\columnwidth]{fig/occupy}
\caption{The information channel modeling of the agents in the referential game.}
\label{fig:modeling}
\end{figure}
Now we can analyse the information of the concepts preserved in the transmission process given the symbol transmitted, i.e. the conditional mutual information $MI\left(t,\hat{t}|s\right)$. Whenever a stable language emerged, the speaker and the listener consistently use a specific symbol $s$ to refer to a specific object $t$. Therefore we can safely say $MI\left(t,\hat{t}|s\right) = MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right)$ where $s_{t,\hat{t}}=\argmax_s\left\{P\left(\hat{t}|s\right)P\left(s|t\right)\right\}$. This conditional mutual information can be obtained by Equation~\ref{eq:cmi}.
Before giving the definition of MIS, we first model the agents in the referential games. As shown in Figure~\ref{fig:modeling}, the listener and speaker in the referential game are connected in tandem. The speaker agent can be regard as a channel, whose input is a concept $c = (c_0, c_1)$ and output is a symbol $s = (s_0, s_1)$. The listener agent can be regard as another channel, whose input is a symbol $s = (s_0, s_1)$ and output is a predict result $\hat{t} = (\hat{c}_0, \hat{c}_1)$. Since the output of the listener only depends on the symbol $s$, we can model the policy of the speaker agent and the listener agent by the probability distribution $P(s = (s_0, s_1) | t = (c_0, c_1))$ and $P(\hat{t} = (\hat{c}_0, \hat{c}_1) | s_0, s_1)$, respectively.
Now we can analyse the information of the concepts preserved in the transmission process given the symbol transmitted, i.e. the conditional mutual information $I\left(t,\hat{t}|s\right)$. Whenever a stable language emerged, the speaker and the listener consistently use a specific symbol $s$ to refer to a specific object $t$. Therefore we can safely say $I\left(t,\hat{t}|s\right) = I\left(t,\hat{t}|s_{t,\hat{t}}\right)$ where $s_{t,\hat{t}}=\max_s\left\{P\left(\hat{t}|s\right)P\left(s|t\right)\right\}$. This conditional mutual information can be obtained by Equation~\ref{eq:cmi}.
\begin{equation}\label{eq:cmi}
MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right) = \sum_t\sum_{\hat{t}}P\left(t,\hat{t}|s=s_{t,\hat{t}}\right)\log\frac{P\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{P\left(t\right) P\left(\hat{t}|s=s_{t,\hat{t}}\right)}
I\left(t,\hat{t}|s_{t,\hat{t}}\right) = \sum_t\sum_{\hat{t}}P\left(t,\hat{t}|s_{t,\hat{t}}\right)\log\frac{P\left(t,\hat{t}|s_{t,\hat{t}}\right)}{P\left(t\right) P\left(\hat{t}|s_{t,\hat{t}}\right)}
\end{equation}
We define the ratio of preserved information $RI(t, s)$ as Equation~\ref{eq:ri}, where $H(t)$ denotes the information entropy of $t$. $RI(t,s)$ measures the degree of alignment between symbols and objects.
We define the ratio of preserved information $R(t, s)$ as Equation~\ref{eq:ri}, where $H(t)$ denotes the information entropy of $t$. $R(t,s)$ measures the degree of alignment between symbols and objects.
\begin{equation}\label{eq:ri}
RI\left(t,s\right)=\frac{MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{H\left(t\right)}
R\left(t,s\right)=\frac{I\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{H\left(t\right)}
\end{equation}
Following the Equation~\ref{eq:ri} we can obtain the normalized mutual information matrix $MRI^B$ by collecting $RI(c_i, s_j)$ for all $i, j$, as Equation~\ref{eq:mri}.
Following the Equation~\ref{eq:ri} we can obtain the normalized mutual information matrix $M$ by collecting $R(c_i, s_j)$ for all $i, j$, as Equation~\ref{eq:mri}.
\begin{equation}\label{eq:mri}
MRI^B =
M =
\begin{pmatrix}
RI\left(c_0,s_0\right) & RI\left(c_0,s_0\right)\\
RI\left(c_0,s_0\right) & RI\left(c_0,s_0\right)
R\left(c_0,s_0\right) & R\left(c_0,s_0\right)\\
R\left(c_0,s_0\right) & R\left(c_0,s_0\right)
\end{pmatrix}
\end{equation}
Each column of $MRI^B$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $MRI^B$ and a one-hot vector is align with the compositionality of the emergent language.
Finally, we define $MIS_0$ as the average cosine similarity of $MRI^B$ columns and one-hot vectors, as Equation~\ref{eq:mis2}. Furthermore, $MIS$ is the normalized $MIS_0$ into the $[0,1]$ value range.
\begin{equation}\label{eq:mis2}
MIS_0 = \frac{1}{2}\sum_{j=0}^1\frac{\max_{i=0,1}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
MIS = 2MIS_0 - 1
\end{equation}
Generalized to $M$ symbols and $N$ objects, $MIS$ is as Equation~\ref{eq:mis}
\begin{equation}\label{eq:mis2}
MIS_0 = \frac{1}{M}\sum_{j=0}^{M-1}\frac{\max_{i\in[0,N-1]}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{N-1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
MIS = \frac{N\cdot MIS_0 - 1}{N-1}
\end{equation}
Each column of $M$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $M$ and a one-hot vector is align with the compositionality of the emergent language.
Finally, we define \emph{raw mutual information similarity} (denoted as $S_0$) as the average cosine similarity of $M$ columns and one-hot vectors, as Equation~\ref{eq:mis2}. Furthermore, MIS (denoted as $S$) is the normalized raw mutual information similarity into the $[0,1]$ value range.
\begin{equation}\label{eq:mis2}\begin{aligned}
S_0 &= \frac{1}{2}\sum_{j=0}^1\frac{\max_{i=0,1}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
S &= 2S_0 - 1
\end{aligned}\end{equation}
Generalized to $m$ symbols and $n$ objects, $S$ is as Equation~\ref{eq:mis}
\begin{equation}\label{eq:mis2}\begin{aligned}
S_0 &= \frac{1}{m}\sum_{j=0}^{m-1}\frac{\max_{i\in[0,n-1]}R\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{n-1}R^2\left(c_i,s_j\right)}}, \epsilon > 0\\
S &= \frac{n\cdot S_0 - 1}{n-1}
\end{aligned}\end{equation}
\begin{figure}[t]
\centering
......@@ -43,5 +51,5 @@ MIS = \frac{N\cdot MIS_0 - 1}{N-1}
\label{fig:unilateral}
\end{figure}
MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, as shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent shape. From the perspective of speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defect by taking the policy of the listener into account, thus MIS < 1.
MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent shape. From the perspective of speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defect by taking the policy of the listener into account, thus $MIS < 1$.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment