ROAIOct 16, 2024

Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

arXiv:2410.12773v132 citationsh-index: 11CoRL
Originality Incremental advance
AI Analysis

This addresses the problem of enabling humanoid robots to understand language and exhibit human-like behaviors for better integration into human environments, representing an incremental advancement.

The paper tackles generating diverse whole-body motions for humanoid robots from language descriptions by leveraging human motion priors and Vision Language Models, resulting in natural and text-aligned motions validated in simulated and real-world experiments.

Humanoid robots, with their human-like embodiment, have the potential to integrate seamlessly into human environments. Critical to their coexistence and cooperation with humans is the ability to understand natural language communications and exhibit human-like behaviors. This work focuses on generating diverse whole-body motions for humanoid robots from language descriptions. We leverage human motion priors from extensive human motion datasets to initialize humanoid motions and employ the commonsense reasoning capabilities of Vision Language Models (VLMs) to edit and refine these motions. Our approach demonstrates the capability to produce natural, expressive, and text-aligned humanoid motions, validated through both simulated and real-world experiments. More videos can be found at https://ut-austin-rpl.github.io/Harmon/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes