ROAILGSep 2, 2024

Grounding Language Models in Autonomous Loco-manipulation Tasks

arXiv:2409.01326v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of whole-body coordination and task planning for humanoid robots, which is crucial for advancing embodied intelligence and enabling robots to assist in daily life, though it appears incremental in integrating existing methods like RL and LLMs.

The authors tackled the challenge of enabling humanoid robots to perform long-horizon loco-manipulation tasks under open-ended verbal instructions by proposing a framework that combines reinforcement learning with whole-body optimization and leverages large language models for planning. Their approach demonstrated high autonomy in adapting to new tasks in unstructured scenes, as validated through simulation and real-world experiments with the CENTAURO robot.

Humanoid robots with behavioral autonomy have consistently been regarded as ideal collaborators in our daily lives and promising representations of embodied intelligence. Compared to fixed-based robotic arms, humanoid robots offer a larger operational space while significantly increasing the difficulty of control and planning. Despite the rapid progress towards general-purpose humanoid robots, most studies remain focused on locomotion ability with few investigations into whole-body coordination and tasks planning, thus limiting the potential to demonstrate long-horizon tasks involving both mobility and manipulation under open-ended verbal instructions. In this work, we propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios. We combine reinforcement learning (RL) with whole-body optimization to generate robot motions and store them into a motion library. We further leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph that comprises a series of motion primitives to bridge lower-level execution with higher-level planning. Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks, demonstrating high autonomy from free-text commands in unstructured scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes