Mesh Graphormer
This addresses the problem of accurate 3D human modeling for computer vision applications, representing an incremental improvement by hybridizing existing methods.
The paper tackles 3D human pose and mesh reconstruction from a single image by combining graph convolutions and self-attentions in a transformer to model local and global interactions, achieving state-of-the-art results on benchmarks like Human3.6M, 3DPW, and FreiHAND.
We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image. Recently both transformers and graph convolutional neural networks (GCNNs) have shown promising progress in human mesh reconstruction. Transformer-based approaches are effective in modeling non-local interactions among 3D mesh vertices and body joints, whereas GCNNs are good at exploiting neighborhood vertex interactions based on a pre-specified mesh topology. In this paper, we study how to combine graph convolutions and self-attentions in a transformer to model both local and global interactions. Experimental results show that our proposed method, Mesh Graphormer, significantly outperforms the previous state-of-the-art methods on multiple benchmarks, including Human3.6M, 3DPW, and FreiHAND datasets. Code and pre-trained models are available at https://github.com/microsoft/MeshGraphormer