CVFeb 18, 2022

Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks

arXiv:2202.09206v1
Originality Incremental advance
AI Analysis

This work addresses the robustness issue in lighting estimation for computer vision applications, representing an incremental improvement over existing deep learning approaches.

The paper tackles the problem of robust outdoor lighting estimation from image sequences by aggregating noisy individual estimates using a transformer network, achieving improved accuracy with fewer hyperparameters compared to state-of-the-art methods.

In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes