Emergent Translation in Multi-Agent Communication
This work addresses the challenge of data-efficient translation for AI systems by mimicking human-like learning, though it is incremental in combining existing ideas of grounding and multi-agent communication.
The paper tackles the problem of machine translation without parallel corpora by proposing a multi-agent communication game where agents learn to translate through visual grounding and interaction, achieving better performance than baselines on word-level and sentence-level tasks and showing improved learning in multilingual settings.
While most machine translation systems to date are trained on large parallel corpora, humans learn language in a different way: by being grounded in an environment and interacting with other humans. In this work, we propose a communication game where two agents, native speakers of their own respective languages, jointly learn to solve a visual referential task. We find that the ability to understand and translate a foreign language emerges as a means to achieve shared goals. The emergent translation is interactive and multimodal, and crucially does not require parallel corpora, but only monolingual, independent text and corresponding images. Our proposed translation model achieves this by grounding the source and target languages into a shared visual modality, and outperforms several baselines on both word-level and sentence-level translation tasks. Furthermore, we show that agents in a multilingual community learn to translate better and faster than in a bilingual communication setting.