AIDec 17, 2018

Learning Common Representation from RGB and Depth Images

arXiv:1812.06873v110 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of multi-modal data fusion in computer vision, offering a novel approach for cross-modality inference, though it appears incremental in improving feature fusion techniques.

The paper tackles the problem of semantic segmentation and depth prediction from RGB-D images by proposing a deep learning architecture that learns a common representation from both modalities, enabling cross-modality scenarios like using depth for segmentation or RGB for depth estimation. The method is demonstrated on two public datasets, showing effectiveness in both tasks.

We propose a new deep learning architecture for the tasks of semantic segmentation and depth prediction from RGB-D images. We revise the state of art based on the RGB and depth feature fusion, where both modalities are assumed to be available at train and test time. We propose a new architecture where the feature fusion is replaced with a common deep representation. Combined with an encoder-decoder type of the network, the architecture can jointly learn models for semantic segmentation and depth estimation based on their common representation. This representation, inspired by multi-view learning, offers several important advantages, such as using one modality available at test time to reconstruct the missing modality. In the RGB-D case, this enables the cross-modality scenarios, such as using depth data for semantically segmentation and the RGB images for depth estimation. We demonstrate the effectiveness of the proposed network on two publicly available RGB-D datasets. The experimental results show that the proposed method works well in both semantic segmentation and depth estimation tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes