PAD-Hand: Physics-Aware Diffusion for Hand Motion Recovery
This work addresses the need for physics consistency in hand motion recovery for applications like robotics and VR, though it is incremental as it builds on existing diffusion and physics integration methods.
The paper tackles the problem of reconstructing physically plausible hand motion from images by proposing a physics-aware diffusion framework that refines noisy pose sequences and estimates physics variance, showing consistent gains over image-based initializations and competitive video-based methods on two hand datasets.
Significant advancements made in reconstructing hands from images have delivered accurate single-frame estimates, yet they often lack physics consistency and provide no notion of how confidently the motion satisfies physics. In this paper, we propose a novel physics-aware conditional diffusion framework that refines noisy pose sequences into physically plausible hand motion while estimating the physics variance in motion estimates. Building on a MeshCNN-Transformer backbone, we formulate Euler-Lagrange dynamics for articulated hands. Unlike prior works that enforce zero residuals, we treat the resulting dynamic residuals as virtual observables to more effectively integrate physics. Through a last-layer Laplace approximation, our method produces per-joint, per-time variances that measure physics consistency and offers interpretable variance maps indicating where physical consistency weakens. Experiments on two well-known hand datasets show consistent gains over strong image-based initializations and competitive video-based methods. Qualitative results confirm that our variance estimations are aligned with the physical plausibility of the motion in image-based estimates.