ROMay 22

Language Movement Primitives: Grounding Language Models in Robot Motion

Yinlong Dai, Benjamin A. Christie, Daniel J. Evans, Dylan P. Losey, Simon Stepputtis

arXiv:2602.0283980.93 citationsh-index: 6

AI Analysis

For robotics researchers, this work provides a zero-shot method to connect abstract language understanding with continuous motion control, addressing a key bottleneck in robot manipulation.

The paper proposes Language Movement Primitives (LMPs), a framework that bridges high-level VLM reasoning with low-level robot motion control by grounding natural language instructions into Dynamic Movement Primitive parameters. In 31 real-world manipulation tasks, LMP achieves 65% task success, outperforming the best baseline at 35%.

Enabling robots to perform novel manipulation tasks from natural language instructions remains a fundamental challenge in robotics, despite significant progress in generalized problem solving with foundational models. Large vision and language models (VLMs) are capable of processing high-dimensional input data for visual scene and language understanding, as well as decomposing tasks into a sequence of logical steps; however, they struggle to ground those steps in embodied robot motion. On the other hand, robotics foundation models output action commands, but require in-domain fine-tuning or experience before they are able to perform novel tasks successfully. At its core, there still remains the fundamental challenge of connecting abstract task reasoning with low-level motion control. To address this disconnect, we propose Language Movement Primitives (LMPs), a framework that grounds VLM reasoning in Dynamic Movement Primitive (DMP) parameterization. Our key insight is that DMPs provide a small number of interpretable parameters, and VLMs can set these parameters to specify diverse, continuous, and stable trajectories. Put another way: VLMs can reason over free-form natural language task descriptions, and semantically ground their desired motions into DMPs -- bridging the gap between high-level task reasoning and low-level position and velocity control. Building on this combination of VLMs and DMPs, we formulate our LMP pipeline for zero-shot robot manipulation that effectively completes tabletop manipulation problems by generating a sequence of DMP motions. Across 31 real-world manipulation tasks, we show that LMP achieves 65% task success as compared to 35% for the best performing baseline. See videos at our website: https://collab.me.vt.edu/lmp

View on arXiv PDF

Similar