CVDec 15, 2021

Consistent Depth Prediction under Various Illuminations using Dilated Cross Attention

arXiv:2112.08006v1Has Code
AI Analysis

This work addresses depth prediction consistency for indoor scenes, which is incremental as it builds on existing methods with a new dataset and architectural tweak.

The paper tackles the problem of consistent depth prediction under varying illumination conditions by introducing a new indoor dataset (Vari) and a dilated cross attention (DCA) block, achieving significant improvements over state-of-the-art methods on the Vari dataset.

In this paper, we aim to solve the problem of consistent depth prediction in complex scenes under various illumination conditions. The existing indoor datasets based on RGB-D sensors or virtual rendering have two critical limitations - sparse depth maps (NYU Depth V2) and non-realistic illumination (SUN CG, SceneNet RGB-D). We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps, obtaining a new indoor depth dataset called Vari dataset. We propose a simple convolutional block named DCA by applying depthwise separable dilated convolution on encoded features to process global information and reduce parameters. We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations. Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in our experiments. We also conduct the ablation study, finetune our model on NYU Depth V2 and also evaluate on real-world data to further validate the effectiveness of our DCA block. The code, pre-trained weights and Vari dataset are open-sourced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes