CVOct 18, 2021

Leveraging MoCap Data for Human Mesh Recovery

arXiv:2110.09243v116 citations
Originality Highly original
AI Analysis

This work addresses the data scarcity issue in human pose and shape estimation for computer vision applications, offering a practical solution with broad applicability.

The paper tackles the problem of expensive and hard-to-obtain annotations for human mesh recovery by using 3D Motion Capture (MoCap) data to improve image-based and video-based methods, achieving state-of-the-art performance on multiple datasets such as 3DPW and MPI-INF-3DHP.

Training state-of-the-art models for human body pose and shape recovery from images or videos requires datasets with corresponding annotations that are really hard and expensive to obtain. Our goal in this paper is to study whether poses from 3D Motion Capture (MoCap) data can be used to improve image-based and video-based human mesh recovery methods. We find that fine-tune image-based models with synthetic renderings from MoCap data can increase their performance, by providing them with a wider variety of poses, textures and backgrounds. In fact, we show that simply fine-tuning the batch normalization layers of the model is enough to achieve large gains. We further study the use of MoCap data for video, and introduce PoseBERT, a transformer module that directly regresses the pose parameters and is trained via masked modeling. It is simple, generic and can be plugged on top of any state-of-the-art image-based model in order to transform it in a video-based model leveraging temporal information. Our experimental results show that the proposed approaches reach state-of-the-art performance on various datasets including 3DPW, MPI-INF-3DHP, MuPoTS-3D, MCB and AIST. Test code and models will be available soon.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes