Merge branch 'master' of http://62.234.201.16/hao/AAAI21_Emergent_language

f2359c95 · Zidong Du · 28b78e16 · fa9c30d7 · f2359c95 · f2359c95
Commit f2359c95 authored Sep 10, 2020 by Zidong Du
Show whitespace changes
Inline Side-by-side

Showing with 42 additions and 25 deletions

AAAI2021/paper.tex
+4 -1

AAAI2021/tex/experiments.tex
+8 -2

AAAI2021/tex/theory2.tex
+30 -22

No files found.
--- a/AAAI2021/paper.tex
+++ b/AAAI2021/paper.tex
@@ -164,6 +164,9 @@
    firstAuthor@affiliation1.com, secondAuthor@affilation2.com, thirdAuthor@affiliation1.com
 }
 \fi
+
+\DeclareMathOperator*{\argmax}{arg\,max}
+
 \begin{document}

 \maketitle
@@ -219,10 +222,10 @@
  
 \end{abstract}

-
 \input{tex/introduction.tex}
 \input{tex/relatedwork.tex}
 \input{tex/theory.tex}
+\input{tex/theory2.tex}
 \input{tex/experiments.tex}
 \input{tex/last.tex}


--- a/AAAI2021/tex/experiments.tex
+++ b/AAAI2021/tex/experiments.tex
@@ -17,11 +17,17 @@ the mean value of MIS decreases as the value of $h_{size}$ increases. Taking the
 configuration of vocabulary size $|V|=10$ as an example, the mean value of MIS
 is around 0.8 when $h_{size}\le 20$; MIS significantly decreases to 0.75 when
 $h_{size}$ increases from 20 to 40; MIS further reduces to 0.7 when $h_{size}$
-increases from 40 to 100. For different vocabulary sizes, the MIS shares the
-similar behaviour. In summary, lower agent capacity improves the possibility of
+increases from 40 to 100.
+For different vocabulary sizes, the MIS shares the
+similar behaviour.
+It is because symbols in low-compositional languages carry semantic information
+about more concepts. As a result, higher capacity is required to characterize the
+complex semantic information for low-compositional language to emerge.
+In summary, lower agent capacity improves the possibility of
 emerging high compositional symbolic language.


+
 \begin{figure}[t]
  \centering
  \includegraphics[width=0.9\columnwidth]{fig/occupy}

--- a/AAAI2021/tex/theory2.tex
+++ b/AAAI2021/tex/theory2.tex
@@ -3,38 +3,46 @@
 In this section, we propose the \emph{Mutual Information Similarity (MIS)} as a metric of compositionality, and give a thorough theoretical analyse. 
 MIS is the similarity between an identity matrix and the mutual information matrix of concepts and symbols.

-Before giving the definition of MIS, we first model the agents in the referential games. As shown in Figure~\ref{}, the listener and speaker in the referential game are connected in tandem. The speaker agent $S$ can be regard as a channel, whose input is a concept $c = ($c_0, c_1$) and output is a symbol $s = ($s_0, s_1$). The listener agent $L$ can be regard as another channel, whose input is a symbol $s = ($s_0, s_1$) and output is a predict result $\hat{t} = (\hat{c}_0, \hat{c}_1)$. Since the output of $L$ only depends on the symbol $s$, we can model the policy of the speaker agent and the listener agent by the probability distribution $P(s = (s_0, s_1) | t = (c_0, c_1))$ and $P(\hat{t} = (\hat{c}_0, \hat{c}_1) | s_0, s_1)$, respectively.
+\begin{figure}[t]
+  \centering
+  \includegraphics[width=0.9\columnwidth]{fig/occupy}
+  \caption{The information channel modeling of the agents in the referential game.}
+  \label{fig:modeling}
+\end{figure}
+

-Now we can analyse the information of the concepts preserved in the transmission process given the symbol transmitted, i.e. the conditional mutual information $MI\left(t,\hat{t}|s\right)$. Whenever a stable language emerged, the speaker and the listener consistently use a specific symbol $s$ to refer to a specific object $t$. Therefore we can safely say $MI\left(t,\hat{t}|s\right) = MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right)$ where $s_{t,\hat{t}}=\argmax_s\left\{P\left(\hat{t}|s\right)P\left(s|t\right)\right\}$. This conditional mutual information can be obtained by Equation~\ref{eq:cmi}.
+Before giving the definition of MIS, we first model the agents in the referential games. As shown in Figure~\ref{fig:modeling}, the listener and speaker in the referential game are connected in tandem. The speaker agent can be regard as a channel, whose input is a concept $c = (c_0, c_1)$ and output is a symbol $s = (s_0, s_1)$. The listener agent can be regard as another channel, whose input is a symbol $s = (s_0, s_1)$ and output is a predict result $\hat{t} = (\hat{c}_0, \hat{c}_1)$. Since the output of the listener only depends on the symbol $s$, we can model the policy of the speaker agent and the listener agent by the probability distribution $P(s = (s_0, s_1) | t = (c_0, c_1))$ and $P(\hat{t} = (\hat{c}_0, \hat{c}_1) | s_0, s_1)$, respectively.
+
+Now we can analyse the information of the concepts preserved in the transmission process given the symbol transmitted, i.e. the conditional mutual information $I\left(t,\hat{t}|s\right)$. Whenever a stable language emerged, the speaker and the listener consistently use a specific symbol $s$ to refer to a specific object $t$. Therefore we can safely say $I\left(t,\hat{t}|s\right) = I\left(t,\hat{t}|s_{t,\hat{t}}\right)$ where $s_{t,\hat{t}}=\max_s\left\{P\left(\hat{t}|s\right)P\left(s|t\right)\right\}$. This conditional mutual information can be obtained by Equation~\ref{eq:cmi}.

 \begin{equation}\label{eq:cmi}
-MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right) = \sum_t\sum_{\hat{t}}P\left(t,\hat{t}|s=s_{t,\hat{t}}\right)\log\frac{P\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{P\left(t\right) P\left(\hat{t}|s=s_{t,\hat{t}}\right)}
+I\left(t,\hat{t}|s_{t,\hat{t}}\right) = \sum_t\sum_{\hat{t}}P\left(t,\hat{t}|s_{t,\hat{t}}\right)\log\frac{P\left(t,\hat{t}|s_{t,\hat{t}}\right)}{P\left(t\right) P\left(\hat{t}|s_{t,\hat{t}}\right)}
 \end{equation}

-We define the ratio of preserved information $RI(t, s)$ as Equation~\ref{eq:ri}, where $H(t)$ denotes the information entropy of $t$. $RI(t,s)$ measures the degree of alignment between symbols and objects.
+We define the ratio of preserved information $R(t, s)$ as Equation~\ref{eq:ri}, where $H(t)$ denotes the information entropy of $t$. $R(t,s)$ measures the degree of alignment between symbols and objects.
 \begin{equation}\label{eq:ri}
-RI\left(t,s\right)=\frac{MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{H\left(t\right)}
+R\left(t,s\right)=\frac{I\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{H\left(t\right)}
 \end{equation} 
-Following the Equation~\ref{eq:ri} we can obtain the normalized mutual information matrix $MRI^B$ by collecting $RI(c_i, s_j)$ for all $i, j$, as Equation~\ref{eq:mri}.
+Following the Equation~\ref{eq:ri} we can obtain the normalized mutual information matrix $M$ by collecting $R(c_i, s_j)$ for all $i, j$, as Equation~\ref{eq:mri}.
 \begin{equation}\label{eq:mri}
-MRI^B = 
+M = 
 \begin{pmatrix}
-RI\left(c_0,s_0\right) & RI\left(c_0,s_0\right)\\
-RI\left(c_0,s_0\right) & RI\left(c_0,s_0\right)
+R\left(c_0,s_0\right) & R\left(c_0,s_0\right)\\
+R\left(c_0,s_0\right) & R\left(c_0,s_0\right)
 \end{pmatrix}
 \end{equation}
-Each column of $MRI^B$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $MRI^B$ and a one-hot vector is align with the compositionality of the emergent language.
-
-Finally, we define $MIS_0$ as the average cosine similarity of $MRI^B$ columns and one-hot vectors, as Equation~\ref{eq:mis2}. Furthermore, $MIS$ is the normalized $MIS_0$ into the $[0,1]$ value range.
-\begin{equation}\label{eq:mis2}
-MIS_0 = \frac{1}{2}\sum_{j=0}^1\frac{\max_{i=0,1}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
-MIS = 2MIS_0 - 1
-\end{equation}
-Generalized to $M$ symbols and $N$ objects, $MIS$ is as Equation~\ref{eq:mis}
-\begin{equation}\label{eq:mis2}
-MIS_0 = \frac{1}{M}\sum_{j=0}^{M-1}\frac{\max_{i\in[0,N-1]}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{N-1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
-MIS = \frac{N\cdot MIS_0 - 1}{N-1}
-\end{equation}
+Each column of $M$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $M$ and a one-hot vector is align with the compositionality of the emergent language.
+
+Finally, we define \emph{raw mutual information similarity} (denoted as $S_0$) as the average cosine similarity of $M$ columns and one-hot vectors, as Equation~\ref{eq:mis2}. Furthermore, MIS (denoted as $S$) is the normalized raw mutual information similarity into the $[0,1]$ value range.
+\begin{equation}\label{eq:mis2}\begin{aligned}
+S_0 &= \frac{1}{2}\sum_{j=0}^1\frac{\max_{i=0,1}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
+S &= 2S_0 - 1
+\end{aligned}\end{equation}
+Generalized to $m$ symbols and $n$ objects, $S$ is as Equation~\ref{eq:mis}
+\begin{equation}\label{eq:mis2}\begin{aligned}
+S_0 &= \frac{1}{m}\sum_{j=0}^{m-1}\frac{\max_{i\in[0,n-1]}R\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{n-1}R^2\left(c_i,s_j\right)}}, \epsilon > 0\\
+S &= \frac{n\cdot S_0 - 1}{n-1}
+\end{aligned}\end{equation}

 \begin{figure}[t]
  \centering
@@ -43,5 +51,5 @@ MIS = \frac{N\cdot MIS_0 - 1}{N-1}
  \label{fig:unilateral}
 \end{figure}

-MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, as shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent shape. From the perspective of speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defect by taking the policy of the listener into account, thus MIS < 1.
+MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent shape. From the perspective of speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defect by taking the policy of the listener into account, thus $MIS < 1$.