Reference-Centric Models for Grounded Collaborative Dialogue
This addresses the challenge of grounded collaborative dialogue for AI agents in tasks requiring shared reference resolution, representing a strong specific gain but incremental over prior work.
The paper tackles the problem of enabling agents to collaborate in a partially-observable reference game by pooling information and communicating pragmatically to identify shared objects, resulting in a 20% relative improvement in self-play and 50% in human evaluations for task completion.
We present a grounded neural dialogue model that successfully collaborates with people in a partially-observable reference game. We focus on a setting where two agents each observe an overlapping part of a world context and need to identify and agree on some object they share. Therefore, the agents should pool their information and communicate pragmatically to solve the task. Our dialogue agent accurately grounds referents from the partner's utterances using a structured reference resolver, conditions on these referents using a recurrent memory, and uses a pragmatic generation procedure to ensure the partner can resolve the references the agent produces. We evaluate on the OneCommon spatial grounding dialogue task (Udagawa and Aizawa 2019), involving a number of dots arranged on a board with continuously varying positions, sizes, and shades. Our agent substantially outperforms the previous state of the art for the task, obtaining a 20% relative improvement in successful task completion in self-play evaluations and a 50% relative improvement in success in human evaluations.