Emergent Communication with World Models
This addresses communication challenges in multi-agent AI systems, though it is incremental as it builds on existing world model frameworks.
The paper tackles the problem of emergent communication in multi-agent navigation by introducing Language World Models, which ground natural language messages in visual predictions of future observations, leading to improved communication and task success in 2D gridworld tasks.
We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages by predicting latent codes of future observations. This provides a visual grounding of the message, similar to an enhanced observation of the world, which may include objects outside of the listening agent's field-of-view. We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it, akin to the relationship between memory and controller in a World Model. We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks. In addition, we develop two losses framed specifically for our model-based formulation to promote positive signalling and positive listening. Finally, because messages are interpreted in a generative model, we can visualize the model beliefs to gain insight into how the communication channel is utilized.