Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space
This work addresses the need for standardized AI-ready datasets in Earth observation, providing a foundational resource for researchers and practitioners in geospatial analysis, though it is incremental as it builds on an existing community project.
The authors tackled the lack of efficient vector representations for Earth observation data by extending the Major TOM project to release four global and dense embedding datasets, resulting in the most comprehensive open dataset of geospatial visual embeddings in terms of covered Earth's surface.
With the ever-increasing volumes of the Earth observation data present in the archives of large programmes such as Copernicus, there is a growing need for efficient vector representations of the underlying raw data. The approach of extracting feature representations from pretrained deep neural networks is a powerful approach that can provide semantic abstractions of the input data. However, the way this is done for imagery archives containing geospatial data has not yet been defined. In this work, an extension is proposed to an existing community project, Major TOM, focused on the provision and standardization of open and free AI-ready datasets for Earth observation. Furthermore, four global and dense embedding datasets are released openly and for free along with the publication of this manuscript, resulting in the most comprehensive global open dataset of geospatial visual embeddings in terms of covered Earth's surface.