CVApr 5, 2023

Self-supervised 3D Human Pose Estimation from a Single Image

arXiv:2304.02349v110 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the problem of data annotation bottlenecks for researchers and practitioners in computer vision, though it is incremental as it builds on earlier self-supervision ideas.

The paper tackles 3D human pose estimation from a single image using a self-supervised method that reduces the need for annotated data, outperforming state-of-the-art self-supervised methods on Human3.6M and matching performance on MPI-INF-3DHP.

We propose a new self-supervised method for predicting 3D human body pose from a single image. The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses. By minimising the need for annotated data, the method has the potential for rapid application to pose estimation of other articulated structures (e.g. animals). The self-supervision comes from an earlier idea exploiting consistency between predicted pose under 3D rotation. Our method is a substantial advance on state-of-the-art self-supervised methods in training a mapping directly from images, without limb articulation constraints or any 3D empirical pose prior. We compare performance with state-of-the-art self-supervised methods using benchmark datasets that provide images and ground-truth 3D pose (Human3.6M, MPI-INF-3DHP). Despite the reduced requirement for annotated data, we show that the method outperforms on Human3.6M and matches performance on MPI-INF-3DHP. Qualitative results on a dataset of human hands show the potential for rapidly learning to predict 3D pose for articulated structures other than the human body.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes