CVApr 30

A generalised pre-training strategy for deep learning networks in semantic segmentation of remotely sensed images

Yuan Fang, Yuanzhi Cai, Jagannath Aryal, Qinfeng Zhu, Hong Huang, Cheng Zhang, Lei Fan

arXiv:2604.2770428.7

AI Analysis

It addresses the domain gap problem in remote sensing segmentation for practitioners who rely on pre-trained models, offering a simple method to improve generalization without requiring large domain-specific datasets.

The paper proposes a pre-training strategy that discourages learning domain-specific features, achieving state-of-the-art results on four remote sensing segmentation datasets (e.g., 67.4% mIoU on iSAID, 56.9% on MFNet, 84.22% on PST900, 91.88% mF1 on Potsdam).

In the segmentation of remotely sensed images, deep learning models are typically pre-trained using large image databases like ImageNet before fine-tuned on domain-specific datasets. However, the performance of these fine-tuned models is often hindered by the large domain gaps (i.e., differences in scenes and modalities) between ImageNet's images and remotely sensed images being processed. Therefore, many researchers have undertaken efforts to establish large-scale domain-specific image datasets for pre-training, aiming to enhance model performance. However, establishing such datasets is often challenging, requiring significant effort, and these datasets often exhibit limited generaliza-bility to other application scenarios. To address these issues, this study introduces a novel yet simple pre-training strategy designed to guide a model away from learning domain-specific features in a pre-training dataset during pre-training, thereby improving the generalisation ability of the pre-trained model. To evaluate the strategy's effectiveness, deep learning models are pre-trained on ImageNet and subsequently fine-tuned on four semantic segmentation datasets with diverse scenes and modalities, including iSAID, MFNet, PST900 and Potsdam. Experimental results show that the proposed pre-training strategy led to state-of-the-art accuracies on all four datasets, namely 67.4% mIoU for iSAID, 56.9% mIoU for MFNet, 84.22% mIoU for PST900, 91.88% mF1 for Potsdam. This research lays the groundwork for developing a unified foundation model applicable to both computer vision and remote sensing applications.

View on arXiv PDF

Similar