CVApr 8, 2020

Multi-Person Absolute 3D Human Pose Estimation with Weak Depth Supervision

arXiv:2004.03989v117 citations
AI Analysis

This addresses the problem of limited training data for multi-person 3D pose estimation, enabling more accurate models using widely available depth sensors, though it is incremental in leveraging existing depth data.

The paper tackles the lack of large, diverse datasets for multi-person 3D human pose estimation by introducing a network trained with weakly supervised RGB-D images, achieving state-of-the-art results on the MuPoTS-3D dataset with a considerable margin.

In 3D human pose estimation one of the biggest problems is the lack of large, diverse datasets. This is especially true for multi-person 3D pose estimation, where, to our knowledge, there are only machine generated annotations available for training. To mitigate this issue, we introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion. Due to the existence of cheap sensors, videos with depth maps are widely available, and our method can exploit a large, unannotated dataset. Our algorithm is a monocular, multi-person, absolute pose estimator. We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates. Also, our model achieves state-of-the-art results on the MuPoTS-3D dataset by a considerable margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes