3D Human Pose Estimation with Siamese Equivariant Embedding
This addresses a specific issue in 3D human pose estimation for computer vision applications, offering an incremental improvement over existing methods.
The paper tackled the problem of overfitting to camera positions in monocular 3D human pose estimation by proposing a siamese architecture that learns a rotation equivariant hidden representation, resulting in state-of-the-art cross-camera error rates among methods using only estimated 2D joint coordinates.
In monocular 3D human pose estimation a common setup is to first detect 2D positions and then lift the detection into 3D coordinates. Many algorithms suffer from overfitting to camera positions in the training set. We propose a siamese architecture that learns a rotation equivariant hidden representation to reduce the need for data augmentation. Our method is evaluated on multiple databases with different base networks and shows a consistent improvement of error metrics. It achieves state-of-the-art cross-camera error rate among algorithms that use estimated 2D joint coordinates only.