D3L: Decomposition of 3D Rotation and Lift from 2D Joint to 3D for Human Mesh Recovery
This work solves the problem of accurate 3D human pose and shape estimation for computer vision applications, representing an incremental improvement by integrating advancements from human pose estimation.
The paper tackles the problem of 3D human mesh recovery by addressing rotation ambiguity and shape overfitting in existing methods, proposing D3L to decompose rotations and lift from 2D joints, resulting in improved performance on the Human3.6M dataset.
Existing methods for 3D human mesh recovery always directly estimate SMPL parameters, which involve both joint rotations and shape parameters. However, these methods present rotation semantic ambiguity, rotation error accumulation, and shape estimation overfitting, which also leads to errors in the estimated pose. Additionally, these methods have not efficiently leveraged the advancements in another hot topic, human pose estimation. To address these issues, we propose a novel approach, Decomposition of 3D Rotation and Lift from 2D Joint to 3D mesh (D3L). We disentangle 3D joint rotation into bone direction and bone twist direction so that the human mesh recovery task is broken down into estimation of pose, twist, and shape, which can be handled independently. Then we design a 2D-to-3D lifting network for estimating twist direction and 3D joint position from 2D joint position sequences and introduce a nonlinear optimization method for fitting shape parameters and bone directions. Our approach can leverage human pose estimation methods, and avoid pose errors introduced by shape estimation overfitting. We conduct experiments on the Human3.6M dataset and demonstrate improved performance compared to existing methods by a large margin.