MonoMSK: Monocular 3D Musculoskeletal Dynamics Estimation
This work addresses the challenge of accurate 3D human motion estimation for applications like biomechanics and healthcare, representing a novel advancement rather than an incremental improvement.
The paper tackled the problem of reconstructing biomechanically realistic 3D human motion from monocular video by introducing MonoMSK, a hybrid framework that jointly recovers kinematics and kinetics, significantly outperforming state-of-the-art methods in kinematic accuracy and enabling precise monocular kinetics estimation for the first time.
Reconstructing biomechanically realistic 3D human motion - recovering both kinematics (motion) and kinetics (forces) - is a critical challenge. While marker-based systems are lab-bound and slow, popular monocular methods use oversimplified, anatomically inaccurate models (e.g., SMPL) and ignore physics, fundamentally limiting their biomechanical fidelity. In this work, we introduce MonoMSK, a hybrid framework that bridges data-driven learning and physics-based simulation for biomechanically realistic 3D human motion estimation from monocular video. MonoMSK jointly recovers both kinematics (motions) and kinetics (forces and torques) through an anatomically accurate musculoskeletal model. By integrating transformer-based inverse dynamics with differentiable forward kinematics and dynamics layers governed by ODE-based simulation, MonoMSK establishes a physics-regulated inverse-forward loop that enforces biomechanical causality and physical plausibility. A novel forward-inverse consistency loss further aligns motion reconstruction with the underlying kinetic reasoning. Experiments on BML-MoVi, BEDLAM, and OpenCap show that MonoMSK significantly outperforms state-of-the-art methods in kinematic accuracy, while for the first time enabling precise monocular kinetics estimation.