CVAIROOct 4, 2022

FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions

arXiv:2210.01595v220 citationsh-index: 30Has Code
Originality Incremental advance
AI Analysis

This work addresses scene understanding in indoor environments using omnidirectional images, but it is incremental as it combines existing tasks with a novel convolution approach.

The authors tackled the problem of joint monocular depth estimation and semantic segmentation from single panoramic images by introducing FreDSNet, which uses fast Fourier convolutions to leverage 360-degree context, achieving performance similar to state-of-the-art methods in both tasks.

In this work we present FreDSNet, a deep learning solution which obtains semantic 3D understanding of indoor environments from single panoramas. Omnidirectional images reveal task-specific advantages when addressing scene understanding problems due to the 360-degree contextual information about the entire environment they provide. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequential domain obtaining a wider receptive field in each convolutional layer. These convolutions allow to leverage the whole context information from omnidirectional images. FreDSNet is the first network that jointly provides monocular depth estimation and semantic segmentation from a single panoramic image exploiting fast Fourier convolutions. Our experiments show that FreDSNet has similar performance as specific state of the art methods for semantic segmentation and depth estimation. FreDSNet code is publicly available in https://github.com/Sbrunoberenguel/FreDSNet

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes