Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks
This work addresses pixel-wise scene labeling for remote sensing applications, representing an incremental advancement in domain-specific methods.
The paper tackles semantic segmentation of Earth Observation images by adapting a SegNet variant with multi-scale and multimodal fusion strategies, achieving improved state-of-the-art accuracy on the ISPRS Vaihingen dataset.
This work investigates the use of deep fully convolutional neural networks (DFCNN) for pixel-wise scene labeling of Earth Observation images. Especially, we train a variant of the SegNet architecture on remote sensing data over an urban area and study different strategies for performing accurate semantic segmentation. Our contributions are the following: 1) we transfer efficiently a DFCNN from generic everyday images to remote sensing images; 2) we introduce a multi-kernel convolutional layer for fast aggregation of predictions at multiple scales; 3) we perform data fusion from heterogeneous sensors (optical and laser) using residual correction. Our framework improves state-of-the-art accuracy on the ISPRS Vaihingen 2D Semantic Labeling dataset.