CVJan 8, 2019

Neural Inverse Rendering of an Indoor Scene from a Single Image

arXiv:1901.02453v3174 citations
AI Analysis

This addresses the challenge of comprehensive scene understanding for applications like augmented reality and robotics, though it is incremental as it builds on prior inverse rendering work.

The authors tackled the problem of jointly estimating albedo, normals, and lighting for indoor scenes from a single image, achieving state-of-the-art performance over methods that estimate only one or more attributes.

Inverse rendering aims to estimate physical attributes of a scene, e.g., reflectance, geometry, and lighting, from image(s). Inverse rendering has been studied primarily for single objects or with methods that solve for only one of the scene attributes. We propose the first learning-based approach that jointly estimates albedo, normals, and lighting of an indoor scene from a single image. Our key contribution is the Residual Appearance Renderer (RAR), which can be trained to synthesize complex appearance effects (e.g., inter-reflection, cast shadows, near-field illumination, and realistic shading), which would be neglected otherwise. This enables us to perform self-supervised learning on real data using a reconstruction loss, based on re-synthesizing the input image from the estimated components. We finetune with real data after pretraining with synthetic data. To this end, we use physically-based rendering to create a large-scale synthetic dataset, which is a significant improvement over prior datasets. Experimental results show that our approach outperforms state-of-the-art methods that estimate one or more scene attributes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes