AIApr 21, 2017

Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities

Dilip Arumugam, Siddharth Karamcheti, Nakul Gopalan, Lawson L. S. Wong, Stefanie Tellex

arXiv:1704.06616v263 citations

Originality Incremental advance

AI Analysis

This addresses the inefficiency and inaccuracy in human-robot interaction for tasks requiring flexible command interpretation, representing a domain-specific advancement.

The paper tackles the problem of robots interpreting human instructions at varying levels of abstraction, such as high-level commands like 'grab a pallet' or low-level ones like 'tilt back a little bit', by grounding language to tasks in a hierarchical planning framework, resulting in improved accuracy and efficiency with response times within one second on 90% of tasks compared to baselines taking over twenty seconds on half.

Humans can ground natural language commands to tasks at both abstract and fine-grained levels of specificity. For instance, a human forklift operator can be instructed to perform a high-level action, like "grab a pallet" or a low-level action like "tilt back a little bit." While robots are also capable of grounding language commands to tasks, previous methods implicitly assume that all commands and tasks reside at a single, fixed level of abstraction. Additionally, methods that do not use multiple levels of abstraction encounter inefficient planning and execution times as they solve tasks at a single level of abstraction with large, intractable state-action spaces closely resembling real world complexity. In this work, by grounding commands to all the tasks or subtasks available in a hierarchical planning framework, we arrive at a model capable of interpreting language at multiple levels of specificity ranging from coarse to more granular. We show that the accuracy of the grounding procedure is improved when simultaneously inferring the degree of abstraction in language used to communicate the task. Leveraging hierarchy also improves efficiency: our proposed approach enables a robot to respond to a command within one second on 90% of our tasks, while baselines take over twenty seconds on half the tasks. Finally, we demonstrate that a real, physical robot can ground commands at multiple levels of abstraction allowing it to efficiently plan different subtasks within the same planning hierarchy.

View on arXiv PDF

Similar