The red one!: On learning to refer to things based on their discriminative properties
This work addresses the challenge of learning communication about visual scenes for AI agents, but it is incremental as it presents a preliminary step without broad validation.
The paper tackles the problem of enabling agents to refer to objects in visual environments by identifying discriminative attributes that distinguish a referent from its context, such as 'has_tail' for a cat versus a sofa, and demonstrates referential success in a preliminary experiment.
As a first step towards agents learning to communicate about their visual environment, we propose a system that, given visual representations of a referent (cat) and a context (sofa), identifies their discriminative attributes, i.e., properties that distinguish them (has_tail). Moreover, despite the lack of direct supervision at the attribute level, the model learns to assign plausible attributes to objects (sofa-has_cushion). Finally, we present a preliminary experiment confirming the referential success of the predicted discriminative attributes.