CVDec 17, 2020

End-to-End Human Pose and Mesh Reconstruction with Transformers

arXiv:2012.09760v3772 citationsHas Code
AI Analysis

This work provides a more generalizable and robust method for 3D human and hand pose and mesh reconstruction, which is beneficial for computer vision researchers and applications requiring detailed 3D body representations.

This paper introduces METRO, a transformer-based method for reconstructing 3D human pose and mesh vertices from a single image without relying on parametric mesh models. METRO achieves state-of-the-art results on Human3.6M and 3DPW datasets for human mesh reconstruction, and also outperforms existing methods on the FreiHAND dataset for 3D hand reconstruction.

We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets. Moreover, we demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset. Code and pre-trained models are available at https://github.com/microsoft/MeshTransformer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes