Translating Neuralese
This addresses the challenge of making AI agent communication interpretable for researchers and practitioners, though it is incremental as it builds on existing multiagent communication methods.
The paper tackles the problem of interpreting communication strategies in decentralized multiagent policies by translating agent messages into natural language without parallel data, achieving results where players using translation do not suffer substantial reward loss compared to those with a common language.
Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel. While these policies are effective for many tasks, interpretation of their induced communication strategies has remained a challenge. Here we propose to interpret agents' messages by translating them. Unlike in typical machine translation problems, we have no parallel data to learn from. Instead we develop a translation model based on the insight that agent messages and natural language strings mean the same thing if they induce the same belief about the world in a listener. We present theoretical guarantees and empirical evidence that our approach preserves both the semantics and pragmatics of messages by ensuring that players communicating through a translation layer do not suffer a substantial loss in reward relative to players with a common language.