CVLGMar 17, 2020

Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild

arXiv:2003.07581v1125 citations
Originality Highly original
AI Analysis

This addresses the data scarcity problem for researchers and practitioners in computer vision by enabling 3D pose estimation from easily acquired in-the-wild images, though it is incremental as it builds on existing weakly-supervised methods.

The paper tackles the challenge of acquiring training data for monocular 3D human pose estimation in-the-wild by proposing a weakly-supervised approach that learns from unlabeled multi-view images without 3D annotations, achieving state-of-the-art performance on datasets like Human3.6M and MPII-INF-3DHP.

One major challenge for monocular 3D human pose estimation in-the-wild is the acquisition of training data that contains unconstrained images annotated with accurate 3D poses. In this paper, we address this challenge by proposing a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data, which can be acquired easily in in-the-wild environments. We propose a novel end-to-end learning framework that enables weakly-supervised training using multi-view consistency. Since multi-view consistency is prone to degenerated solutions, we adopt a 2.5D pose representation and propose a novel objective function that can only be minimized when the predictions of the trained model are consistent and plausible across all camera views. We evaluate our proposed approach on two large scale datasets (Human3.6M and MPII-INF-3DHP) where it achieves state-of-the-art performance among semi-/weakly-supervised methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes