ROApr 17

CLAW: Composable Language-Annotated Whole-body Motion Generation

arXiv:2604.1125124.6h-index: 5Has Code
Predicted impact top 25% in RO · last 90 daysOriginality Incremental advance
AI Analysis

Provides a scalable pipeline for creating motion-language paired data for humanoid robot learning, addressing the bottleneck of data scarcity in this domain.

CLAW generates large-scale, physically feasible, language-annotated whole-body motion data for humanoid robots by composing motion primitives from a kinematic planner and tracking them in simulation, enabling scalable data generation without costly motion capture.

Training language-conditioned whole-body controllers for humanoid robots demands large-scale motion-language datasets. Existing approaches based on motion capture are costly and limited in diversity, while text-to-motion generative models produce purely kinematic outputs that are not guaranteed to be physically feasible. We present CLAW, a pipeline for scalable generation of language-annotated whole-body motion data for the Unitree G1 humanoid robot. CLAW composes motion primitives from a kinematic planner, parameterized by movement, heading, speed, pelvis height, and duration, and provides two browser-based interfaces--a real-time keyboard mode and a timeline-based sequence editor--for exploratory and batch data collection. A low-level controller tracks these references in MuJoCo simulation, yielding physically grounded trajectories. In parallel, a template-based engine generates diverse natural-language annotations at both segment and trajectory levels. To support scalable generation of motion-language paired data for humanoid robot learning, we make our system publicly available at: https://github.com/JianuoCao/CLAW

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes