CVFeb 28, 2023

Remote Sensing Scene Classification with Masked Image Modeling (MIM)

arXiv:2302.14256v26 citationsh-index: 6
AI Analysis

This work addresses remote sensing classification for applications like geological survey and wildfire monitoring, presenting an incremental improvement using self-supervised learning.

The paper tackled remote sensing scene classification by applying Masked Image Modeling (MIM) pretrained Vision Transformers to four datasets, showing they outperform other methods by up to 18% in top-1 accuracy and learn better features than supervised learning by up to 5%.

Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitoring. In the past, the Machine Learning (ML) methods for performing the task mainly used the backbones pretrained in the manner of supervised learning (SL). As Masked Image Modeling (MIM), a self-supervised learning (SSL) technique, has been shown as a better way for learning visual feature representation, it presents a new opportunity for improving ML performance on the scene classification task. This research aims to explore the potential of MIM pretrained backbones on four well-known classification datasets: Merced, AID, NWPU-RESISC45, and Optimal-31. Compared to the published benchmarks, we show that the MIM pretrained Vision Transformer (ViTs) backbones outperform other alternatives (up to 18% on top 1 accuracy) and that the MIM technique can learn better feature representation than the supervised learning counterparts (up to 5% on top 1 accuracy). Moreover, we show that the general-purpose MIM-pretrained ViTs can achieve competitive performance as the specially designed yet complicated Transformer for Remote Sensing (TRS) framework. Our experiment results also provide a performance baseline for future studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes