Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series
This addresses a problem for remote sensing applications by providing more useful embeddings for real-world uses, though it appears incremental as it builds on existing SSL models.
The paper tackles the challenge of creating foundation models for irregular and unaligned satellite image time series by proposing ALISE, which produces aligned latent representations and shows improved effectiveness over previous self-supervised learning methods in linear probing segmentation tasks.
Although recently several foundation models for satellite remote sensing imagery have been proposed, they fail to address major challenges of real/operational applications. Indeed, embeddings that don't take into account the spectral, spatial and temporal dimensions of the data as well as the irregular or unaligned temporal sampling are of little use for most real world uses. As a consequence, we propose an ALIgned Sits Encoder (ALISE), a novel approach that leverages the spatial, spectral, and temporal dimensions of irregular and unaligned SITS while producing aligned latent representations. Unlike SSL models currently available for SITS, ALISE incorporates a flexible query mechanism to project the SITS into a common and learned temporal projection space. Additionally, thanks to a multi-view framework, we explore integration of instance discrimination along a masked autoencoding task to SITS. The quality of the produced representation is assessed through three downstream tasks: crop segmentation (PASTIS), land cover segmentation (MultiSenGE), and a novel crop change detection dataset. Furthermore, the change detection task is performed without supervision. The results suggest that the use of aligned representations is more effective than previous SSL methods for linear probing segmentation tasks.