How Noisy Poses Break Inverse Dynamics: Analysis and Mitigation for Video-Based Joint Torque Estimation
For researchers estimating joint torques from video, this work provides a systematic analysis and a practical method to mitigate noise amplification, though the approach is incremental.
The paper analyzes how noise in monocular 3D human pose estimation propagates through inverse dynamics, finding that pose noise is amplified ~1000x when computing joint torques, with proximal joints up to 10x more sensitive. They introduce SMPL-Dynamics, a differentiable inverse dynamics module, and show that differentiable pose refinement reduces torque error by 93% with minimal pose change.
Recent advances in monocular 3D human pose estimation enable accurate body tracking from video. However, translating these kinematic estimates into physical quantities, such as joint torques, remains challenging due to noise amplification through inverse dynamics. In this work, we provide a systematic analysis of how pose estimation noise propagates through the inverse dynamics pipeline. We present three key findings: (1) pose noise is amplified by approximately 1,000x when computing joint torques via numerical differentiation, (2) proximal joints (spine, hips) are up to 10x more sensitive to noise than distal joints (wrists, hands), and (3) low-pass filtering before differentiation substantially reduces this amplification. To enable this analysis, we develop SMPL-Dynamics, a fully differentiable inverse dynamics module for the SMPL body model that requires no external physics simulators. Our module supports end-to-end gradient computation, and we demonstrate this through differentiable pose refinement, which reduces torque error by 93% with negligible change in pose.