Compositional Obverter Communication Learning From Raw Visual Input
This addresses the challenge of enabling AI agents to learn human-like compositional language from raw data, which is incremental by building on prior work with disentangled inputs.
The paper tackles the problem of training neural agents to develop compositional communication from raw visual inputs, showing that agents can learn a structured language through an image description game and obverter introspection, as demonstrated by qualitative analysis and zero-shot tests.
One of the distinguishing aspects of human language is its compositionality, which allows us to describe complex environments with limited vocabulary. Previously, it has been shown that neural network agents can learn to communicate in a highly structured, possibly compositional language based on disentangled input (e.g. hand- engineered features). Humans, however, do not learn to communicate based on well-summarized features. In this work, we train neural agents to simultaneously develop visual perception from raw image pixels, and learn to communicate with a sequence of discrete symbols. The agents play an image description game where the image contains factors such as colors and shapes. We train the agents using the obverter technique where an agent introspects to generate messages that maximize its own understanding. Through qualitative analysis, visualization and a zero-shot test, we show that the agents can develop, out of raw image pixels, a language with compositional properties, given a proper pressure from the environment.