SBSS: Stacking-Based Semantic Segmentation Framework for Very High Resolution Remote Sensing Image
This work addresses a domain-specific problem for remote sensing image analysis, offering incremental improvements in segmentation efficiency and accuracy.
The paper tackled the challenge of semantic segmentation in Very High Resolution remote sensing images, where object scale variations hinder accuracy, by proposing a Stacking-Based Semantic Segmentation (SBSS) framework that learns class-specific scale preferences; it achieved higher accuracy than Multi-Scale testing with similar computational cost and similar accuracy to Single-Scale testing with a quarter of the memory footprint.
Semantic segmentation of Very High Resolution (VHR) remote sensing images is a fundamental task for many applications. However, large variations in the scales of objects in those VHR images pose a challenge for performing accurate semantic segmentation. Existing semantic segmentation networks are able to analyse an input image at up to four resizing scales, but this may be insufficient given the diversity of object scales. Therefore, Multi Scale (MS) test-time data augmentation is often used in practice to obtain more accurate segmentation results, which makes equal use of the segmentation results obtained at the different resizing scales. However, it was found in this study that different classes of objects had their preferred resizing scale for more accurate semantic segmentation. Based on this behaviour, a Stacking-Based Semantic Segmentation (SBSS) framework is proposed to improve the segmentation results by learning this behaviour, which contains a learnable Error Correction Module (ECM) for segmentation result fusion and an Error Correction Scheme (ECS) for computational complexity control. Two ECS, i.e., ECS-MS and ECS-SS, are proposed and investigated in this study. The Floating-point operations (Flops) required for ECS-MS and ECS-SS are similar to the commonly used MS test and the Single-Scale (SS) test, respectively. Extensive experiments on four datasets (i.e., Cityscapes, UAVid, LoveDA and Potsdam) show that SBSS is an effective and flexible framework. It achieved higher accuracy than MS when using ECS-MS, and similar accuracy as SS with a quarter of the memory footprint when using ECS-SS.