CVAug 19, 2022

Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion

arXiv:2208.09480v144 citationsh-index: 96
Originality Highly original
AI Analysis

This work addresses the problem of realistic virtual object insertion in outdoor scenes for applications like augmented reality and autonomous driving, representing an incremental improvement with a novel hybrid approach.

The paper tackles outdoor lighting estimation for photorealistic virtual object insertion by proposing a neural method that estimates a 5D HDR light field from a single image, using a hybrid representation with an HDR sky dome and volumetric lighting, and demonstrates improved performance over existing methods and gains in an autonomous driving application.

We consider the challenging problem of outdoor lighting estimation for the goal of photorealistic virtual object insertion into photographs. Existing works on outdoor lighting estimation typically simplify the scene lighting into an environment map which cannot capture the spatially-varying lighting effects in outdoor scenes. In this work, we propose a neural approach that estimates the 5D HDR light field from a single image, and a differentiable object insertion formulation that enables end-to-end training with image-based losses that encourage realism. Specifically, we design a hybrid lighting representation tailored to outdoor scenes, which contains an HDR sky dome that handles the extreme intensity of the sun, and a volumetric lighting representation that models the spatially-varying appearance of the surrounding scene. With the estimated lighting, our shadow-aware object insertion is fully differentiable, which enables adversarial training over the composited image to provide additional supervisory signal to the lighting prediction. We experimentally demonstrate that our hybrid lighting representation is more performant than existing outdoor lighting estimation methods. We further show the benefits of our AR object insertion in an autonomous driving application, where we obtain performance gains for a 3D object detector when trained on our augmented data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes