SpectralEarth: Training Hyperspectral Foundation Models at Scale
This work addresses the problem of limited hyperspectral data for remote sensing researchers, enabling more effective foundation models in this domain, though it is incremental as it adapts existing self-supervised methods to a new dataset.
The authors tackled the lack of comprehensive hyperspectral datasets by introducing SpectralEarth, a large-scale multitemporal dataset with 538,974 image patches from 11,636 globally distributed scenes, and pretrained foundation models that demonstrated versatility and generalizability across nine downstream tasks.
Foundation models have triggered a paradigm shift in computer vision and are increasingly being adopted in remote sensing, particularly for multispectral imagery. Yet, their potential in hyperspectral imaging (HSI) remains untapped due to the absence of comprehensive and globally representative hyperspectral datasets. To close this gap, we introduce SpectralEarth, a large-scale multitemporal dataset designed to pretrain hyperspectral foundation models leveraging data from the environmental mapping and analysis program (EnMAP). SpectralEarth comprises 538 974 image patches covering 415 153 unique locations from 11 636 globally distributed EnMAP scenes spanning two years of archive. In addition, 17.5% of these locations include multiple timestamps, enabling multitemporal HSI analysis. Utilizing state-of-the-art self-supervised learning algorithms, we pretrain a series of foundation models on SpectralEarth, integrating a spectral adapter into classical vision backbones to accommodate the unique characteristics of HSI. In tandem, we construct nine downstream datasets for land-cover, crop-type mapping, and tree-species classification, providing benchmarks for model evaluation. Experimental results support the versatility of our models and their generalizability across different tasks and sensors. We also highlight computational efficiency during model fine-tuning.