Room Envelopes: A Synthetic Dataset for Indoor Layout Reconstruction from Images
This work addresses the challenge of reconstructing occluded surfaces in indoor scenes for computer vision applications, but it is incremental as it primarily introduces a dataset rather than a novel method.
The paper tackles the problem of incomplete 3D scene reconstructions by focusing on predicting structural elements like walls, floors, and ceilings, and presents a synthetic dataset called Room Envelopes to facilitate this task.
Modern scene reconstruction methods are able to accurately recover 3D surfaces that are visible in one or more images. However, this leads to incomplete reconstructions, missing all occluded surfaces. While much progress has been made on reconstructing entire objects given partial observations using generative models, the structural elements of a scene, like the walls, floors and ceilings, have received less attention. We argue that these scene elements should be relatively easy to predict, since they are typically planar, repetitive and simple, and so less costly approaches may be suitable. In this work, we present a synthetic dataset -- Room Envelopes -- that facilitates progress on this task by providing a set of RGB images and two associated pointmaps for each image: one capturing the visible surface and one capturing the first surface once fittings and fixtures are removed, that is, the structural layout. As we show, this enables direct supervision for feed-forward monocular geometry estimators that predict both the first visible surface and the first layout surface. This confers an understanding of the scene's extent, as well as the shape and location of its objects.