LingoMotion: An Interpretable and Unambiguous Symbolic Representation for Human Motion
This work addresses interpretability issues in human motion modeling for applications like animation and robotics, but it appears incremental as it builds on hierarchical language-inspired concepts.
The paper tackles the problem of limited interpretability and ambiguity in existing human motion representations by proposing LingoMotion, a symbolic motion language based on joint angles, with preliminary results showing high fidelity in representation on the Motion-X dataset.
Existing representations for human motion, such as MotionGPT, often operate as black-box latent vectors with limited interpretability and build on joint positions which can cause ambiguity. Inspired by the hierarchical structure of natural languages - from letters to words, phrases, and sentences - we propose LingoMotion, a motion language that facilitates interpretable and unambiguous symbolic representation for both simple and complex human motion. In this paper, we introduce the concept design of LingoMotion, including the definitions of motion alphabet based on joint angles, the morphology for forming words and phrases to describe simple actions like walking and their attributes like speed and scale, as well as the syntax for describing more complex human activities with sequences of words and phrases. The preliminary results, including the implementation and evaluation of motion alphabet using a large-scale motion dataset Motion-X, demonstrate the high fidelity of motion representation.