CVLGIVJun 14, 2020

Cascaded deep monocular 3D human pose estimation with evolutionary training data

arXiv:2006.07778v3178 citations
AI Analysis

This addresses dataset bias and generalization issues in 3D human pose estimation for computer vision applications, but is incremental as it builds on existing data augmentation and representation methods.

The paper tackles the problem of limited and fixed training data causing failures in monocular 3D human pose estimation for unseen poses, by proposing a scalable data augmentation method that synthesizes over 8 million 3D human poses, achieving state-of-the-art accuracy and better generalization to rare poses.

End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data. This paper proposes a novel data augmentation method that: (1) is scalable for synthesizing massive amount of training data (over 8 million valid 3D human poses with corresponding 2D projections) for training 2D-to-3D networks, (2) can effectively reduce dataset bias. Our method evolves a limited dataset to synthesize unseen 3D human skeletons based on a hierarchical human representation and heuristics inspired by prior knowledge. Extensive experiments show that our approach not only achieves state-of-the-art accuracy on the largest public benchmark, but also generalizes significantly better to unseen and rare poses. Code, pre-trained models and tools are available at this HTTPS URL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes