CVDec 2, 2020

Holistic 3D Human and Scene Mesh Estimation from Single View Images

arXiv:2012.01591v267 citations
AI Analysis

This work addresses the problem of holistic 3D scene understanding for computer vision researchers by jointly estimating human and scene meshes from a single image, which is a novel approach.

This paper introduces a model that reconstructs 3D human and object meshes, camera pose, and room layout from a single RGB image. The model jointly optimizes these elements, outperforming existing methods for human body mesh and indoor scene reconstruction.

The 3D world limits the human body pose and the human body pose conveys information about the surrounding objects. Indeed, from a single image of a person placed in an indoor scene, we as humans are adept at resolving ambiguities of the human pose and room layout through our knowledge of the physical laws and prior perception of the plausible object and human poses. However, few computer vision models fully leverage this fact. In this work, we propose an end-to-end trainable model that perceives the 3D scene from a single RGB image, estimates the camera pose and the room layout, and reconstructs both human body and object meshes. By imposing a set of comprehensive and sophisticated losses on all aspects of the estimations, we show that our model outperforms existing human body mesh methods and indoor scene reconstruction methods. To the best of our knowledge, this is the first model that outputs both object and human predictions at the mesh level, and performs joint optimization on the scene and human poses.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes