..

179ff574 · YZhao · 65d62699 · 179ff574
Commit 179ff574 authored Sep 10, 2020 by YZhao
Hide whitespace changes
Inline Side-by-side

Showing with 40 additions and 0 deletions

AAAI2021/tex/theory2.tex
+40 -0

No files found.
--- a/AAAI2021/tex/theory2.tex
+++ b/AAAI2021/tex/theory2.tex
+\section{Mutual Information Similarity (MIS)}\label{sec:mis}
+
+In this section, we propose the \emph{Mutual Information Similarity (MIS)} as a metric of compositionality, and give a thorough theoretical analyse. 
+MIS is the similarity between an identity matrix and the mutual information matrix of concepts and symbols.
+
+Before giving the definition of MIS, we first model the agents in the referential games. As shown in Figure~\ref{}, the listener and speaker in the referential game are connected in tandem. The speaker agent $S$ can be regard as a channel, whose input is a concept $c = ($c_0, c_1$) and output is a symbol $s = ($s_0, s_1$). The listener agent $L$ can be regard as another channel, whose input is a symbol $s = ($s_0, s_1$) and output is a predict result $\hat{t} = (\hat{c}_0, \hat{c}_1)$. Since the output of $L$ only depends on the symbol $s$, we can model the policy of the speaker agent and the listener agent by the probability distribution $P(s = (s_0, s_1) | t = (c_0, c_1))$ and $P(\hat{t} = (\hat{c}_0, \hat{c}_1) | s_0, s_1)$, respectively.
+
+Now we can analyse the information of the concepts preserved in the transmission process given the symbol transmitted, i.e. the conditional mutual information $MI\left(t,\hat{t}|s\right)$. Whenever a stable language emerged, the speaker and the listener consistently use a specific symbol $s$ to refer to a specific object $t$. Therefore we can safely say $MI\left(t,\hat{t}|s\right) = MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right)$ where $s_{t,\hat{t}}=\argmax_s\left\{P\left(\hat{t}|s\right)P\left(s|t\right)\right\}$. This conditional mutual information can be obtained by Equation~\ref{eq:cmi}.
+
+\begin{equation}\label{eq:cmi}
+MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right) = \sum_t\sum_{\hat{t}}P\left(t,\hat{t}|s=s_{t,\hat{t}}\right)\log\frac{P\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{P\left(t\right) P\left(\hat{t}|s=s_{t,\hat{t}}\right)}
+\end{equation}
+
+We define the ratio of preserved information $RI(t, s)$ as Equation~\ref{eq:ri}, where $H(t)$ denotes the information entropy of $t$. $RI(t,s)$ measures the degree of alignment between symbols and objects.
+\begin{equation}\label{eq:ri}
+RI\left(t,s\right)=\frac{MI\left(t,\hat{t}|s=s_{t,\hat{t}}\right)}{H\left(t\right)}
+\end{equation} 
+Following the Equation~\ref{eq:ri} we can obtain the normalized mutual information matrix $MRI^B$ by collecting $RI(c_i, s_j)$ for all $i, j$, as Equation~\ref{eq:mri}.
+\begin{equation}\label{eq:mri}
+MRI^B = 
+\begin{pmatrix}
+RI\left(c_0,s_0\right) & RI\left(c_0,s_0\right)\\
+RI\left(c_0,s_0\right) & RI\left(c_0,s_0\right)
+\end{pmatrix}
+\end{equation}
+Each column of $MRI^B$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $MRI^B$ and a one-hot vector is align with the compositionality of the emergent language.
+
+Finally, we define $MIS_0$ as the average cosine similarity of $MRI^B$ columns and one-hot vectors, as Equation~\ref{eq:mis2}. Furthermore, $MIS$ is the normalized $MIS_0$ into the $[0,1]$ value range.
+\begin{equation}\label{eq:mis2}
+MIS_0 = \frac{1}{2}\sum_{j=0}^1\frac{\max_{i=0,1}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
+MIS = 2MIS_0 - 1
+\end{equation}
+Generalized to $M$ symbols and $N$ objects, $MIS$ is as Equation~\ref{eq:mis}
+\begin{equation}\label{eq:mis2}
+MIS_0 = \frac{1}{M}\sum_{j=0}^{M-1}\frac{\max_{i\in[0,N-1]}RI\left(c_i,s_j\right)}{\epsilon + \sqrt{\sum_{i=0}^{N-1}RI^2\left(c_i,s_j\right)}}, \epsilon > 0\\
+MIS = \frac{N\cdot MIS_0 - 1}{N-1}
+\end{equation}
+
+MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, as shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent shape. From the perspective of speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defect by taking the policy of the listener into account, thus MIS < 1.
+