Encoder-Decoder based CNN and Fully Connected CRFs for Remote Sensed Image Segmentation
This work addresses the automation of land cover segmentation for remote sensing applications, representing an incremental improvement with a specific gain in accuracy.
The paper tackled the problem of automating object recognition in very high resolution remote-sensed images, which is challenging due to high intra-class and low inter-class variance, by proposing a CNN-FCRF model that achieved an overall accuracy of 90.5% on the ISPRS Vaihingen Dataset.
With the advancement of remote-sensed imaging large volumes of very high resolution land cover images can now be obtained. Automation of object recognition in these 2D images, however, is still a key issue. High intra-class variance and low inter-class variance in Very High Resolution (VHR) images hamper the accuracy of prediction in object recognition tasks. Most successful techniques in various computer vision tasks recently are based on deep supervised learning. In this work, a deep Convolutional Neural Network (CNN) based on symmetric encoder-decoder architecture with skip connections is employed for the 2D semantic segmentation of most common land cover object classes - impervious surface, buildings, low vegetation, trees and cars. Atrous convolutions are employed to have large receptive field in the proposed CNN model. Further, the CNN outputs are post-processed using Fully Connected Conditional Random Field (FCRF) model to refine the CNN pixel label predictions. The proposed CNN-FCRF model achieves an overall accuracy of 90.5% on the ISPRS Vaihingen Dataset.