Learning Compositional Behaviors from Demonstration and Language
This addresses the problem of enabling robots to perform complex, multi-step tasks in varied environments for robotics applications, representing a novel method for a known bottleneck.
The paper tackles the problem of long-horizon robotic manipulation by introducing BLADE, a framework that integrates imitation learning and model-based planning using language-annotated demonstrations and large language models to extract abstract action knowledge. The result is a system that automatically constructs structured action representations, showing significant generalization capabilities to novel situations, validated in simulation and on real robots with diverse objects and constraints.
We introduce Behavior from Language and Demonstration (BLADE), a framework for long-horizon robotic manipulation by integrating imitation learning and model-based planning. BLADE leverages language-annotated demonstrations, extracts abstract action knowledge from large language models (LLMs), and constructs a library of structured, high-level action representations. These representations include preconditions and effects grounded in visual perception for each high-level action, along with corresponding controllers implemented as neural network-based policies. BLADE can recover such structured representations automatically, without manually labeled states or symbolic definitions. BLADE shows significant capabilities in generalizing to novel situations, including novel initial states, external state perturbations, and novel goals. We validate the effectiveness of our approach both in simulation and on real robots with a diverse set of objects with articulated parts, partial observability, and geometric constraints.