@@ -63,14 +63,16 @@ Before going to the detail of the training algorithms, we first introduce the en
Figure~\ref{fig:game} shows the entire environment used in this study,
i.e., a commonly used referential game. Roughly, the referential game requires the speaker and listener to work cooperatively to accomplish a certain task.
In this paper, the task is to have the listener agent reconstruct the object
what the speaker claims it has seen, only through their emerged communication protocol. The success in this game indicates that symbolic language has emerged between speaker and listener.
what the speaker claims it has seen, only through their emerged communication protocol. The consistent success in this game indicates that language has emerged between speaker and listener.
\textbf{Game rules} In our referential game, agents follow the following rules to finish the game in a cooperative manner. In each round, once received an input object $t$, Speaker $S$ speaks a symbol sequence $s$ to Listener $L$ ; Listener $L$ reconstruct the predicted result $\hat{t}$ based on the listened sequence $s$; if $t=\hat{t}$, agents win this game and receive positive rewards ($r(t,\hat{t})=1$); otherwise agents fail this game and receive negative rewards ($r(t,\hat{t})=-1$).
Precisely, during the game, Speaker $S$ receives an input object $t$, which is an expression with two words from the vocabulary set $V$, i.e., two one-hot vectors representing shape and color, respectively. Based on the $t$, Speaker $S$ speaks a symbol sequence $s$, which similarly contains two words from $V$. The Listener $L$ receives $s$ and output predicted result $\hat{t}$, a single word (one-hot vector) selected from the Cartesian product of set two $V$s ($V\times V$), which represents all the meanings of two combined words from $V$. Please note that since $t$ and $\hat{t}$ have different length, we say $t=\hat{t}$ if $t$ expresses the same meaning as $\hat{t}$, e.g., ``red circle''.
\textbf{Game rules} In our referential game, agents follow the following rules to finish the game in a cooperative manner. In each round, once received an input object $t$, Speaker $S$ speaks symbols $s$ to Listener $L$ ; Listener $L$ reconstruct the predicted result $\hat{t}$ based on the listened symbols $s$; if $t=\hat{t}$, agents win this game and receive positive rewards ($r(t,\hat{t})=1$); otherwise agents fail this game and receive negative rewards ($r(t,\hat{t})=-1$).
Precisely, during the game, Speaker $S$ receives an input object$t$, which is a concept-pair with two concepts
from the concept set $M_0$ and $M_1$, i.e., two one-hot vectors representing shape and color, respectively. Based on the $t$, Speaker $S$ speaks a symbol sequence $s$, which similarly contains two words from $V$.
The Listener $L$ receives $s$ and output predicted result $\hat{t}$, a single word (one-hot vector) corresponded with a concept-pair from the Cartesian product of $M_0\times M_1$, which represents all the meanings of two combined words
from $M_0$ and $M_1$. Please note that since $t$ and $\hat{t}$ have different length, we say $t=\hat{t}$ if $t$ expresses the same concept-pair as $\hat{t}$, e.g., ``red circle''.