CV AI CL ROMay 23, 2025

How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control

Kunhang Li, Jason Naradowsky, Yansong Feng, Yusuke Miyao

arXiv:2505.21531v28.42 citationsh-index: 15EMNLP

Originality Incremental advance

AI Analysis

This work addresses the problem of evaluating LLMs' motion knowledge for researchers in AI and robotics, though it is incremental in exploring a specific application.

The study investigated how well large language models (LLMs) understand human motion by using them to control 3D avatars based on motion instructions, finding that LLMs are strong at interpreting high-level movements but struggle with precise body part positioning.

We explore the human motion knowledge of Large Language Models (LLMs) through 3D avatar control. Given a motion instruction, we prompt LLMs to first generate a high-level movement plan with consecutive steps (High-level Planning), then specify body part positions in each step (Low-level Planning), which we linearly interpolate into avatar animations. Using 20 representative motion instructions that cover fundamental movements and balance body part usage, we conduct comprehensive evaluations, including human and automatic scoring of both high-level movement plans and generated animations, as well as automatic comparison with oracle positions in low-level planning. Our findings show that LLMs are strong at interpreting high-level body movements but struggle with precise body part positioning. While decomposing motion queries into atomic components improves planning, LLMs face challenges in multi-step movements involving high-degree-of-freedom body parts. Furthermore, LLMs provide reasonable approximations for general spatial descriptions, but fall short in handling precise spatial specifications. Notably, LLMs demonstrate promise in conceptualizing creative motions and distinguishing culturally specific motion patterns.

View on arXiv PDF

Similar