CVLGAug 10, 2021

MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision

arXiv:2108.04869v229 citations
Originality Incremental advance
AI Analysis

This addresses the need for fast and accurate 3D pose estimation in multi-view settings without requiring 3D ground truth, which is incremental but practical for applications like sports analysis or surveillance.

The paper tackles the problem of 3D human pose estimation from multiple uncalibrated cameras without 3D supervision, achieving high precision and low latency by using only 2D keypoint data for training. It outperforms classical bundle adjustment and weakly-supervised monocular 3D baselines on datasets like Human3.6M and Ski-Pose PTZ.

In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date. We show how to train a neural model to perform this task with high precision and minimal latency overhead. The proposed model takes into account joint location uncertainty due to occlusion from multiple views, and requires only 2D keypoint data for training. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines on the well-established Human3.6M dataset, as well as the more challenging in-the-wild Ski-Pose PTZ dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes