CVFeb 3, 2023

Self-Supervised In-Domain Representation Learning for Remote Sensing Image Scene Classification

arXiv:2302.01793v117 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses performance limitations in remote sensing image classification for researchers and practitioners, offering an incremental improvement over existing self-supervised methods by optimizing dataset selection for pre-training.

The paper tackled the problem of domain differences limiting transfer learning from ImageNet to remote sensing tasks by pre-training in-domain representations using contrastive self-supervised learning (SimSiam), achieving state-of-the-art results on five land cover classification datasets.

Transferring the ImageNet pre-trained weights to the various remote sensing tasks has produced acceptable results and reduced the need for labeled samples. However, the domain differences between ground imageries and remote sensing images cause the performance of such transfer learning to be limited. Recent research has demonstrated that self-supervised learning methods capture visual features that are more discriminative and transferable than the supervised ImageNet weights. We are motivated by these facts to pre-train the in-domain representations of remote sensing imagery using contrastive self-supervised learning and transfer the learned features to other related remote sensing datasets. Specifically, we used the SimSiam algorithm to pre-train the in-domain knowledge of remote sensing datasets and then transferred the obtained weights to the other scene classification datasets. Thus, we have obtained state-of-the-art results on five land cover classification datasets with varying numbers of classes and spatial resolutions. In addition, By conducting appropriate experiments, including feature pre-training using datasets with different attributes, we have identified the most influential factors that make a dataset a good choice for obtaining in-domain features. We have transferred the features obtained by pre-training SimSiam on remote sensing datasets to various downstream tasks and used them as initial weights for fine-tuning. Moreover, we have linearly evaluated the obtained representations in cases where the number of samples per class is limited. Our experiments have demonstrated that using a higher-resolution dataset during the self-supervised pre-training stage results in learning more discriminative and general representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes