CVDec 20, 2016

3D Human Pose Estimation = 2D Pose Estimation + Matching

arXiv:1612.06524v2576 citations
Originality Incremental advance
AI Analysis

This approach improves 3D human pose estimation for computer vision applications by leveraging existing 2D systems and 3D data, though it is incremental in its combination of established components.

The paper tackles 3D human pose estimation from a single RGB image by proposing a simple architecture that uses accurate 2D pose predictions and lifts them to 3D via matching with mocap data, outperforming most state-of-the-art methods.

We explore 3D human pose estimation from a single RGB image. While many approaches try to directly predict 3D pose from image measurements, we explore a simple architecture that reasons through intermediate 2D pose predictions. Our approach is based on two key observations (1) Deep neural nets have revolutionized 2D pose estimation, producing accurate 2D predictions even for poses with self occlusions. (2) Big-data sets of 3D mocap data are now readily available, making it tempting to lift predicted 2D poses to 3D through simple memorization (e.g., nearest neighbors). The resulting architecture is trivial to implement with off-the-shelf 2D pose estimation systems and 3D mocap libraries. Importantly, we demonstrate that such methods outperform almost all state-of-the-art 3D pose estimation systems, most of which directly try to regress 3D pose from 2D measurements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes