ROCLMay 21, 2021

Language Understanding for Field and Service Robots in a Priori Unknown Environments

arXiv:2105.10396v212 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of human-robot collaboration in unstructured settings, offering a novel solution for field and service robots, though it builds incrementally on existing language grounding and imitation learning methods.

The paper tackles the problem of enabling robots to understand and execute natural-language commands in previously unknown environments, achieving successful navigation and mobile-manipulation tasks without requiring prior spatial-semantic maps.

Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. This progress now creates an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Realizing this opportunity requires an efficient and flexible medium through which humans can communicate with collaborative robots. Natural language provides one such medium, and through significant progress in statistical methods for natural-language understanding, robots are now able to interpret a diverse array of free-form commands. However, most contemporary approaches require a detailed, prior spatial-semantic map of the robot's environment that models the space of possible referents of an utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially-observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural-language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a "sensor" -- inferring spatial, topological, and semantic information implicit in the utterance and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic, language grounding model and infer a distribution over a symbolic representation of the robot's action space. We use imitation learning to identify a belief-space policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety navigation and mobile-manipulation experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes