CVDec 27, 2021

Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training

arXiv:2112.13709v210 citations
Originality Incremental advance
AI Analysis

This work addresses the high cost and time of data annotation for computer vision researchers and practitioners in 3D pose estimation, offering an incremental improvement over existing methods.

The paper tackles the inefficiency of data annotation for multi-view 3D pose estimation by proposing a framework that extends active learning strategies and incorporates self-training, achieving a 60% reduction in turn-around time and 80% cost savings on the CMU Panoptic Studio benchmark.

Pose estimation of the human body and hands is a fundamental problem in computer vision, and learning-based solutions require a large amount of annotated data. In this work, we improve the efficiency of the data annotation process for 3D pose estimation problems with Active Learning (AL) in a multi-view setting. AL selects examples with the highest value to annotate under limited annotation budgets (time and cost), but choosing the selection strategy is often nontrivial. We present a framework to efficiently extend existing single-view AL strategies. We then propose two novel AL strategies that make full use of multi-view geometry. Moreover, we demonstrate additional performance gains by incorporating pseudo-labels computed during the AL process, which is a form of self-training. Our system significantly outperforms simulated annotation baselines in 3D body and hand pose estimation on two large-scale benchmarks: CMU Panoptic Studio and InterHand2.6M. Notably, on CMU Panoptic Studio, we are able to reduce the turn-around time by 60% and annotation cost by 80% when compared to the conventional annotation process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes