CVAug 14, 2025

MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

arXiv:2508.10894v21 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of improving self-supervised learning for remote sensing applications, offering incremental advancements in fusion and normalization techniques.

The paper tackled the challenge of adapting self-supervised learning to Earth observation data by introducing MAESTRO, a masked autoencoder with optimized fusion and normalization, achieving state-of-the-art performance on multitemporal tasks and competitive results on others across four datasets.

Self-supervised learning holds great promise for remote sensing, but standard self-supervised methods must be adapted to the unique characteristics of Earth observation data. We take a step in this direction by conducting a comprehensive benchmark of fusion strategies and normalization schemes of reconstruction targets for multimodal, multitemporal, and multispectral Earth observation data. Based on our findings, we introduce MAESTRO, a novel adaptation of the Masked Autoencoder with optimized fusion mechanisms and a normalization scheme that incorporates a spectral prior as a self-supervisory signal. Evaluated on four Earth observation datasets in both intra- and cross-dataset settings, MAESTRO achieves state-of-the-art performance on tasks that strongly rely on multitemporal dynamics, while also remaining competitive on others. Code to reproduce all our experiments is available at https://github.com/ignf/maestro.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes