TerraMesh: A Planetary Mosaic of Multimodal Earth Observation Data
This dataset addresses the need for large-scale, multimodal Earth Observation data for researchers and practitioners in remote sensing and AI, though it is incremental as it builds on existing data collection efforts.
The authors tackled the problem of limited scale, geographic coverage, and sensor variety in public Earth Observation datasets by introducing TerraMesh, a globally diverse, multimodal dataset with over 9 million samples and eight aligned modalities, which demonstrated improved model performance in pre-training.
Large-scale foundation models in Earth Observation can learn versatile, label-efficient representations by leveraging massive amounts of unlabeled data. However, existing public datasets are often limited in scale, geographic coverage, or sensor variety. We introduce TerraMesh, a new globally diverse, multimodal dataset combining optical, synthetic aperture radar, elevation, and land-cover modalities in an Analysis-Ready Data format. TerraMesh includes over 9~million samples with eight spatiotemporal aligned modalities, enabling large-scale pre-training. We provide detailed data processing steps, comprehensive statistics, and empirical evidence demonstrating improved model performance when pre-trained on TerraMesh. The dataset is hosted at https://huggingface.co/datasets/ibm-esa-geospatial/TerraMesh.