Learning to Speak and Act in a Fantasy Text Adventure Game
This work addresses the challenge of creating agents that can effectively communicate and act in interactive environments, though it is incremental in applying existing models to a new setting.
The authors tackled the problem of grounded dialogue by introducing a large-scale crowdsourced text adventure game as a research platform, showing that models using world state details like location and objects improve predictions of agent behavior and dialogue.
We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.