CVNov 6, 2023

Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning

arXiv:2311.02815v11.51 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more efficient and accurate self-supervised methods in human pose estimation, reducing reliance on labor-intensive labeled data, though it is incremental in nature.

The paper tackled the problem of improving self-supervised human pose estimation by analyzing reconstruction quality and developing a model pipeline that outperforms a baseline using less than one-third the training data, achieving better accuracy with fewer labeled figures.

The goal of 2D human pose estimation (HPE) is to localize anatomical landmarks, given an image of a person in a pose. SOTA techniques make use of thousands of labeled figures (finetuning transformers or training deep CNNs), acquired using labor-intensive crowdsourcing. On the other hand, self-supervised methods re-frame the HPE task as a reconstruction problem, enabling them to leverage the vast amount of unlabeled visual data, though at the present cost of accuracy. In this work, we explore ways to improve self-supervised HPE. We (1) analyze the relationship between reconstruction quality and pose estimation accuracy, (2) develop a model pipeline that outperforms the baseline which inspired our work, using less than one-third the amount of training data, and (3) offer a new metric suitable for self-supervised settings that measures the consistency of predicted body part length proportions. We show that a combination of well-engineered reconstruction losses and inductive priors can help coordinate pose learning alongside reconstruction in a self-supervised paradigm.

View on arXiv PDF Code

Similar