DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos
This work addresses the challenge of recovering consistent human motion in dynamic scenarios for applications like animation and virtual reality, representing an incremental improvement over existing video-based methods.
The paper tackles the problem of temporal inconsistencies in human mesh recovery from videos by proposing DiffMesh, a motion-aware diffusion framework that generates accurate and smooth 3D mesh sequences, achieving state-of-the-art results on Human3.6M and 3DPW datasets.
Human mesh recovery (HMR) provides rich human body information for various real-world applications. While image-based HMR methods have achieved impressive results, they often struggle to recover humans in dynamic scenarios, leading to temporal inconsistencies and non-smooth 3D motion predictions due to the absence of human motion. In contrast, video-based approaches leverage temporal information to mitigate this issue. In this paper, we present DiffMesh, an innovative motion-aware Diffusion-like framework for video-based HMR. DiffMesh establishes a bridge between diffusion models and human motion, efficiently generating accurate and smooth output mesh sequences by incorporating human motion within the forward process and reverse process in the diffusion model. Extensive experiments are conducted on the widely used datasets (Human3.6M \cite{h36m_pami} and 3DPW \cite{pw3d2018}), which demonstrate the effectiveness and efficiency of our DiffMesh. Visual comparisons in real-world scenarios further highlight DiffMesh's suitability for practical applications.