CVApr 13, 2021

Learning Multi-modal Information for Robust Light Field Depth Estimation

arXiv:2104.05971v14 citations
AI Analysis

This work addresses robust depth estimation for light field imaging, which is incremental as it improves upon existing focal stack-based methods by incorporating multi-modal information.

The paper tackles the problem of suboptimal depth estimation from light field focal stacks due to defocus blur by proposing a multi-modal learning method that extracts contextual information from both focal stacks and RGB images, then fuses them with an attention-guided module. The method achieves superior performance compared to existing methods on two light field datasets, with visual results demonstrating applicability to mobile phone data.

Light field data has been demonstrated to facilitate the depth estimation task. Most learning-based methods estimate the depth infor-mation from EPI or sub-aperture images, while less methods pay attention to the focal stack. Existing learning-based depth estimation methods from the focal stack lead to suboptimal performance because of the defocus blur. In this paper, we propose a multi-modal learning method for robust light field depth estimation. We first excavate the internal spatial correlation by designing a context reasoning unit which separately extracts comprehensive contextual information from the focal stack and RGB images. Then we integrate the contextual information by exploiting a attention-guide cross-modal fusion module. Extensive experiments demonstrate that our method achieves superior performance than existing representative methods on two light field datasets. Moreover, visual results on a mobile phone dataset show that our method can be widely used in daily life.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes