CVJan 9, 2024

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Ziyue Huang, Mingming Zhang, Yuan Gong, Qingjie Liu, Yunhong Wang

arXiv:2401.04614v216.431 citationsh-index: 47Has CodeIEEE Trans Geosci Remote Sens

Originality Incremental advance

AI Analysis

This work addresses the need for better pre-trained models in remote sensing image understanding, offering a domain-specific solution that is incremental by building on existing pre-training methods.

The authors tackled the problem of domain gaps between natural and remote sensing images by proposing GeRSP, a pre-training framework that combines self-supervised learning on remote sensing images with supervised learning on natural images, resulting in improved performance on downstream tasks like object detection, semantic segmentation, and scene classification.

Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pre-trained weights obtained from ImageNet pre-training (IMP). However, domain gaps exist between remote sensing images and natural images (e.g., ImageNet), making deep learning models initialized by pre-trained weights of IMP perform poorly for remote sensing image understanding. Although some pre-training methods are studied in the remote sensing community, current remote sensing pre-training methods face the problem of vague generalization by only using remote sensing images. In this paper, we propose a novel remote sensing pre-training framework, Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP), to learn robust representations from remote sensing and natural images for remote sensing understanding tasks. GeRSP contains two pre-training branches: (1) A self-supervised pre-training branch is adopted to learn domain-related representations from unlabeled remote sensing images. (2) A supervised pre-training branch is integrated into GeRSP for general knowledge learning from labeled natural images. Moreover, GeRSP combines two pre-training branches using a teacher-student architecture to simultaneously learn representations with general and special knowledge, which generates a powerful pre-trained model for deep learning model initialization. Finally, we evaluate GeRSP and other remote sensing pre-training methods on three downstream tasks, i.e., object detection, semantic segmentation, and scene classification. The extensive experimental results consistently demonstrate that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.

View on arXiv PDF Code

Similar