A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World
This work addresses spatial reasoning for human-computer interaction in a limited domain, representing an incremental integration of existing techniques.
The paper tackles spatial question answering in a blocks world by integrating vision, speech, dialogue, and constraint solving, resulting in a system that maps spatial queries into logical forms and provides answers consistent with human perception.
The blocks world is a classic toy domain that has long been used to build and test spatial reasoning systems. Despite its relative simplicity, tackling this domain in its full complexity requires the agent to exhibit a rich set of functional capabilities, ranging from vision to natural language understanding. There is currently a resurgence of interest in solving problems in such limited domains using modern techniques. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialog manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.