CV AI LGMay 4, 2024

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, Nico Lang

arXiv:2405.02771v229.487 citationsh-index: 15Has CodeECCV

Originality Incremental advance

AI Analysis

This work addresses the problem of limited labeled data for geospatial applications by leveraging multi-modal pretraining, though it is incremental as it builds on existing MAE and ConvNeXt V2 architectures.

The authors tackled the lack of labeled training data for Earth observation applications by creating MMEarth, a multi-modal pretraining dataset of 1.2 million locations, and developed a Multi-Pretext Masked Autoencoder (MP-MAE) that outperforms existing MAEs on downstream tasks like image classification and semantic segmentation, improving linear probing performance, label efficiency, and parameter efficiency.

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that pretraining with multi-modal pretext tasks notably improves the linear probing performance compared to pretraining on optical satellite images only. This also leads to better label efficiency and parameter efficiency which are crucial aspects in global scale applications.

View on arXiv PDF Code

Similar