DensePose: Dense Human Pose Estimation In The Wild
This work addresses the problem of detailed human pose estimation in unconstrained environments for computer vision applications, representing a strong specific gain rather than a foundational breakthrough.
The paper tackles dense human pose estimation by establishing correspondences between RGB images and a 3D body surface, using a new dataset of 50K annotated persons from COCO and training CNN-based systems that handle real-world challenges like occlusions and scale variations. It reports clear improvements in accuracy through methods like inpainting and cascading, achieving highly accurate results in real time.
In this work, we establish dense correspondences between RGB image and a surface-based representation of the human body, a task we refer to as dense human pose estimation. We first gather dense correspondences for 50K persons appearing in the COCO dataset by introducing an efficient annotation pipeline. We then use our dataset to train CNN-based systems that deliver dense correspondence 'in the wild', namely in the presence of background, occlusions and scale variations. We improve our training set's effectiveness by training an 'inpainting' network that can fill in missing groundtruth values and report clear improvements with respect to the best results that would be achievable in the past. We experiment with fully-convolutional networks and region-based models and observe a superiority of the latter; we further improve accuracy through cascading, obtaining a system that delivers highly0accurate results in real time. Supplementary materials and videos are provided on the project page http://densepose.org