..

95bf81d2 · YZhao · 2c3e17b0 · 95bf81d2 · 95bf81d2 · 95bf81d2
Commit 95bf81d2 authored Sep 10, 2020 by YZhao
Hide whitespace changes
Inline Side-by-side

Showing with 22 additions and 39 deletions

AAAI2021/tex/relatedwork.tex
+13 -13

AAAI2021/tex/theory.tex
+6 -23

AAAI2021/tex/theory2.tex
+3 -3

No files found.
--- a/AAAI2021/tex/relatedwork.tex
+++ b/AAAI2021/tex/relatedwork.tex
-\section{Related works}
+\section{Related Works}
 \label{sec:relatedwork}
 %external environmental factors
 Previous works focus on the external environmental factors that impact the
-compositionality of emerged symbolic language. 
+compositionality of emerged symbolic language.
-Some significant works on studying the external environmental factor on the compositionality of emergent language are summarized on Table~\ref{tab:rel}.
+Some significant works on studying the external environmental factor on the compositionality of emergent language are summarized in Table~\ref{tab:rel}.
 For example, ~\citet{kirby2015compression} explored how the pressures for expressivity and compressibility lead the structured language.
 ~\citet{kottur-etal-2017-natural} constrained the vocabulary size and whether the listener has memory to coax the compositionality of the emergent language.
 ~\citet{lazaridou2018emergence} showed that the degree of structure found in the input data affects the emergence of the symbolic language.
 ~\citet{li2019ease} studied how the pressure, ease of teaching, impact on the iterative language of the population regime.
-~\citet{evtimova2018emergent} designed a novel multi-modal scenarios, which the speaker and the listener should access to different modalities of the input object, to explore the language emergence.
+~\citet{evtimova2018emergent} designed novel multi-modal scenarios, which the speaker and the listener should access to different modalities of the input object, to explore the language emergence.
 Such factors are deliberately designed, which are too ideal to be true in
-the real world. 
+the real world.
-In this paper, these handcrafted inductions above are all removed, and the high compostional language is leaded only by the agent capacity. 
+In this paper, these handcrafted inductions above are all removed, and the high compositional language is leaded only by the agent capacity.
@@ -33,18 +33,18 @@ proposed~\cite{kottur-etal-2017-natural,choi2018compositional,lazaridou2018emerg
 %either speakers or listeners. They can not measure the degree of \emph{bilateral}
 %understanding between speakers and listeners, i.e., the concept-symbol mapping
 %consistency between speakers and listeners.
-At the initial stage, many researches only analyzed the language compositionality qualitatively.
+At the initial stage, many studies only analyzed the language compositionality qualitatively.
 For example, ~\citet{choi2018compositional} printed the agent messages with the letter `abcd' at some training round, and directly analyzed the compositionality on these messages.
-~\citet{kottur-etal-2017-natural} introduced the dialog tree to show the evolution of language compositionality during the trianing process.
+~\citet{kottur-etal-2017-natural} introduced the dialog tree to show the evolution of language compositionality during the training process.
 Latter, some quantitative metrics are explored.
 The topographic similarity\cite{lazaridou2018emergence} is introduced to measure the distances between all the possible pairs of meanings and the corresponding pairs of signals.
-\citet{chaabouni2020compositionality} proposed the positional disentanglement, which measures whether symbols in specific postion clearly relate to the specific attribute of the input object. 
+\citet{chaabouni2020compositionality} proposed the positional disentanglement, which measures whether symbols in a specific position relate to the specific attribute of the input object.
-From Table~\ref{tab:rel}, most metrics are proposed on the sight of the speaker. In our view, human begings developed the language based on both the speakers and the listener. Only one research of \cite{choi2018compositional} in Table~\ref{tab:rel} qualitatively considered from the sight of the speaker and the listener. In this paper, we propose a novel quatitative metric from both the speaker's sight and the listener's sight.
+From Table~\ref{tab:rel}, most metrics are proposed on the sight of the speaker. In our view, human beings developed the language based on both the speakers and the listener. Only one research of \cite{choi2018compositional} in Table~\ref{tab:rel} qualitatively considered from the perspective of the speaker and the listener. In this paper, we propose a novel quantitative metric from both the speaker's sight and the listener's sight.
 In conclusion, the previous works coaxed the compositional language based on some careful designed handcrafted inductions,
-and the metric from the sight of both the speaker and the listener is still lacking. 
+and the metric from the sight of both the speaker and the listener is still lacking.
-In this paper, we remove all the handcrafted inductions in Table~\ref{tab:rel}, 
+In this paper, we remove all the handcrafted inductions in Table~\ref{tab:rel},
-and use the minimized induction based on theoretical analysis. 
+and use the minimized induction based on theoretical analysis.
 Moreover, we propose a novel quantitative metric, which is properer than previous works based on the speaker's sight.
--- a/AAAI2021/tex/theory.tex
+++ b/AAAI2021/tex/theory.tex
@@ -62,32 +62,15 @@ Before going to the detail of the training algorithms, we first introduce the en
 \subsection{Environment setup}
 \label{ssec:env}
 Figure~\ref{fig:game} shows the entire environment used in this study,
-i.e., a commonly used referential game. Roughly, the referential game requires
+i.e., a commonly used referential game. Roughly, the referential game requires the speaker and listener to work cooperatively to accomplish a certain task.
-the speaker and listener working cooperatively to accomplish a certain task. 
 In this paper, the task is to have the listener agent reconstruct the object
-what the speaker claims it has seen, only through their emerged communication
+what the speaker claims it has seen, only through their emerged communication protocol. The success in this game indicates that symbolic language has emerged between speaker and listener.
-protocol. The success in this game indicates that symbolic language has emerged
-between speaker and listener. 
-\textbf{Game rules} In our referential game, agents follow the following rules
+\textbf{Game rules} In our referential game, agents follow the following rules to finish the game in a cooperative manner. In each round, once received an input object $t$, Speaker $S$ speaks a symbol sequence $s$ to Listener $L$ ; Listener $L$ reconstruct the predicted result $\hat{t}$ based on the listened sequence $s$; if $t=\hat{t}$, agents win this game and receive positive rewards ($r(t,\hat{t})=1$); otherwise agents fail this game and receive negative rewards ($r(t,\hat{t})=-1$).
-to finish the game in a cooperative manner. In each round, once received an
-input object $t$, Speaker $S$ speaks a symbol sequence $s$ to Listener $L$ ;
-Listener $L$ reconstruct the predicted result $\hat{t}$ based on the listened
-sequence $s$; if $t=\hat{t}$, agents win this game and receive positive rewards
-($r(t,\hat{t})=1$); otherwise agents fail this game and receive negative rewards
-($r(t,\hat{t})=-1$).
-Precisely, during the game, Speaker $S$ receives an input object $t$, which is
+Precisely, during the game, Speaker $S$ receives an input object $t$, which is an expression with two words from the vocabulary set $V$, i.e., two one-hot vectors representing shape and color, respectively. Based on the $t$, Speaker $S$ speaks a symbol sequence $s$, which similarly contains two words from $V$. The Listener $L$ receives $s$ and output predicted result $\hat{t}$, a single word (one-hot vector) selected from the Cartesian product of set two $V$s ($V\times V$), which represents all the meanings of two combined words from $V$. Please note that since $t$ and $\hat{t}$ have different length, we say $t=\hat{t}$ if $t$ expresses the same meaning as $\hat{t}$, e.g., ``red circle''.
-an expression with two words from the vocabulary set $V$, i.e., two
-one-hot vector representing shape and color, respectively. Based on the $t$,
-Speaker $S$ speaks a symbol sequence $s$, which similarly contains two words
-from $V$. The Listener $L$ receives $s$ and output predicted result $\hat{t}$,
-a single word (one-hot vector) selected from the Cartesian product of set two $V$s
-($V\times V$), which representing all the meanings of two combined words from $V$.
-Please note that since $t$ and $\hat{t}$ have different length, we say
-$t=\hat{t}$ if $t$ expresses the same meaning as $\hat{t}$, e.g., ``red circle''. 
@@ -134,8 +117,8 @@ expected reward$ J(\theta_S, \theta_L)$ by fixing the parameter $\theta_S$ and
 adjusting the parameter $\theta_L$.
 Additionally, to avoid the handcrafted induction on emergent language, we only
-use the predict result $\hat{t}$ of the listener agent as the 
+use the predicted result $\hat{t}$ of the listener agent as the 
-evidence of whether giving the positive rewards. Then, the gradients of the
+evidence of whether giving positive rewards. Then, the gradients of the
 expected reward $ J(\theta_S, \theta_L)$ can be calculated as follows:
 \begin{align}
  \nabla_{\theta^S} J &= \mathbb{E}_{\pi^S, \pi^L} \left[ r(\hat{t}, t) \cdot

--- a/AAAI2021/tex/theory2.tex
+++ b/AAAI2021/tex/theory2.tex
 \section{Mutual Information Similarity (MIS)}\label{sec:mis}
-In this section, we propose the \emph{Mutual Information Similarity (MIS)} as a metric of compositionality, and give a thorough theoretical analyse. 
+In this section, we propose the \emph{Mutual Information Similarity (MIS)} as a metric of compositionality and give a thorough theoretical analysis.
 MIS is the similarity between an identity matrix and the mutual information matrix of concepts and symbols.
 \begin{figure}[t]
@@ -38,7 +38,7 @@ R\left(c_0,s_0\right) & R\left(c_0,s_0\right)\\
 R\left(c_0,s_0\right) & R\left(c_0,s_0\right)
 \end{pmatrix}
 \end{equation}
-Each column of $M$ correspond to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $M$ and a one-hot vector is align with the compositionality of the emergent language.
+Each column of $M$ corresponds to the semantic information carried by one symbol. In a perfectly compositional language, each symbol represents one specific concept exclusively. Therefore, the similarity between the columns of $M$ and a one-hot vector is aligned with the compositionality of the emergent language.
 \begin{figure}[t]
  \centering \includegraphics[width=0.99\columnwidth]{fig/Figure6_Compostionality_of_symbolic_language.pdf}
@@ -67,5 +67,5 @@ following formula:
 \end{aligned}\end{equation}
-MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent shape. From the perspective of speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defect by taking the policy of the listener into account, thus $\mathit{MIS} < 1$.
+MIS is a bilateral metric. Unilateral metrics, e.g. \emph{topographic similarity (topo)}\cite{} and \emph{posdis}\cite{}, only take the policy of the speaker into consideration. We provide an example to illustrate the inadequacy of unilateral metrics, shown in Figure~\ref{fig:unilateral}. In this example, the speaker only uses $s_1$ to represent the shape. From the perspective of the speaker, the language is perfectly compositional (i.e. both topo and posdis are 1). However, the listener cannot distinguish the shape depend only on $s_1$, showing the non-compositionality in this language. The bilateral metric MIS addresses such defects by taking the policy of the listener into account, thus $\mathit{MIS} < 1$.