CVJul 29, 2025
AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label dataChristopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella et al.
Unprecedented volumes of Earth observation data are continually collected around the world, but high-quality labels remain scarce given the effort required to make physical measurements and observations. This has led to considerable investment in bespoke modeling efforts translating sparse labels into maps. Here we introduce AlphaEarth Foundations, an embedding field model yielding a highly general, geospatial representation that assimilates spatial, temporal, and measurement contexts across multiple sources, enabling accurate and efficient production of maps and monitoring systems from local to global scales. The embeddings generated by AlphaEarth Foundations are the only to consistently outperform a suite of other well-known/widely accepted featurization approaches tested on a diverse set of mapping evaluations without re-training. We have released a dataset of global, annual, analysis-ready embedding field layers from 2017 through 2024.
CVNov 19, 2021
Evaluating Self and Semi-Supervised Methods for Remote Sensing Segmentation TasksChaitanya Patel, Shashank Sharma, Valerie J. Pasquarella et al.
Self- and semi-supervised machine learning techniques leverage unlabeled data for improving downstream task performance. These methods are especially valuable for remote sensing tasks where producing labeled ground truth datasets can be prohibitively expensive but there is easy access to a wealth of unlabeled imagery. We perform a rigorous evaluation of SimCLR, a self-supervised method, and FixMatch, a semi-supervised method, on three remote sensing tasks: riverbed segmentation, land cover mapping, and flood mapping. We quantify performance improvements on these remote sensing segmentation tasks when additional imagery outside of the original supervised dataset is made available for training. We also design experiments to test the effectiveness of these techniques when the test set is domain shifted to sample different geographic areas compared to the training and validation sets. We find that such techniques significantly improve generalization performance when labeled data is limited and there are geographic domain shifts between the training data and the validation/test data.