Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation
This work addresses accessibility and reuse limitations for Earth Observation downstream tasks through a more efficient model architecture.
The authors tackled the problem of computationally expensive Earth Observation foundation models by proposing a compact Metadata-aware Mixture-of-Experts Masked Autoencoder with only 2.5M parameters, which despite its small size competes with much larger architectures and demonstrates improved transfer and label efficiency.
Recent advances in Earth Observation have focused on large-scale foundation models. However, these models are computationally expensive, limiting their accessibility and reuse for downstream tasks. In this work, we investigate compact architectures as a practical pathway toward smaller general-purpose EO models. We propose a Metadata-aware Mixture-of-Experts Masked Autoencoder (MoE-MAE) with only 2.5M parameters. The model combines sparse expert routing with geo-temporal conditioning, incorporating imagery alongside latitude/longitude and seasonal/daily cyclic encodings. We pretrain the MoE-MAE on the BigEarthNet-Landsat dataset and evaluate embeddings from its frozen encoder using linear probes. Despite its small size, the model competes with much larger architectures, demonstrating that metadata-aware pretraining improves transfer and label efficiency. To further assess generalization, we evaluate on the EuroSAT-Landsat dataset, which lacks explicit metadata, and still observe competitive performance compared to models with hundreds of millions of parameters. These results suggest that compact, metadata-aware MoE-MAEs are an efficient and scalable step toward future EO foundation models.