AIDec 17, 2018

Learning Common Representation from RGB and Depth Images

arXiv:1812.06873v15.610 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of multi-modal data fusion in computer vision, offering a novel approach for cross-modality inference, though it appears incremental in improving feature fusion techniques.

The paper tackles the problem of semantic segmentation and depth prediction from RGB-D images by proposing a deep learning architecture that learns a common representation from both modalities, enabling cross-modality scenarios like using depth for segmentation or RGB for depth estimation. The method is demonstrated on two public datasets, showing effectiveness in both tasks.

We propose a new deep learning architecture for the tasks of semantic segmentation and depth prediction from RGB-D images. We revise the state of art based on the RGB and depth feature fusion, where both modalities are assumed to be available at train and test time. We propose a new architecture where the feature fusion is replaced with a common deep representation. Combined with an encoder-decoder type of the network, the architecture can jointly learn models for semantic segmentation and depth estimation based on their common representation. This representation, inspired by multi-view learning, offers several important advantages, such as using one modality available at test time to reconstruct the missing modality. In the RGB-D case, this enables the cross-modality scenarios, such as using depth data for semantically segmentation and the RGB images for depth estimation. We demonstrate the effectiveness of the proposed network on two publicly available RGB-D datasets. The experimental results show that the proposed method works well in both semantic segmentation and depth estimation tasks.

View on arXiv PDF

Similar