h-index66
18papers
723citations
Novelty50%
AI Score54

18 Papers

APDec 19, 2022
Annual field-scale maps of tall and short crops at the global scale using GEDI and Sentinel-2

Stefania Di Tommaso, Sherrie Wang, Vivek Vajipey et al.

Crop type maps are critical for tracking agricultural land use and estimating crop production. Remote sensing has proven an efficient and reliable tool for creating these maps in regions with abundant ground labels for model training, yet these labels remain difficult to obtain in many regions and years. NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, originally designed for forest monitoring, has shown promise for distinguishing tall and short crops. In the current study, we leverage GEDI to develop wall-to-wall maps of short vs tall crops on a global scale at 10 m resolution for 2019-2021. Specifically, we show that (1) GEDI returns can reliably be classified into tall and short crops after removing shots with extreme view angles or topographic slope, (2) the frequency of tall crops over time can be used to identify months when tall crops are at their peak height, and (3) GEDI shots in these months can then be used to train random forest models that use Sentinel-2 time series to accurately predict short vs. tall crops. Independent reference data from around the world are then used to evaluate these GEDI-S2 maps. We find that GEDI-S2 performed nearly as well as models trained on thousands of local reference training points, with accuracies of at least 87% and often above 90% throughout the Americas, Europe, and East Asia. Systematic underestimation of tall crop area was observed in regions where crops frequently exhibit low biomass, namely Africa and South Asia, and further work is needed in these systems. Although the GEDI-S2 approach only differentiates tall from short crops, in many landscapes this distinction goes a long way toward mapping the main individual crop types. The combination of GEDI and Sentinel-2 thus presents a very promising path towards global crop mapping with minimal reliance on ground data.

LGDec 3, 2025
Observation-driven correction of numerical weather prediction for marine winds

Matteo Peduto, Qidong Yang, Jonathan Giezendanner et al.

Accurate marine wind forecasts are essential for safe navigation, ship routing, and energy operations, yet they remain challenging because observations over the ocean are sparse, heterogeneous, and temporally variable. We reformulate wind forecasting as observation-informed correction of a global numerical weather prediction (NWP) model. Rather than forecasting winds directly, we learn local correction patterns by assimilating the latest in-situ observations to adjust the Global Forecast System (GFS) output. We propose a transformer-based deep learning architecture that (i) handles irregular and time-varying observation sets through masking and set-based attention mechanisms, (ii) conditions predictions on recent observation-forecast pairs via cross-attention, and (iii) employs cyclical time embeddings and coordinate-aware location representations to enable single-pass inference at arbitrary spatial coordinates. We evaluate our model over the Atlantic Ocean using observations from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) as reference. The model reduces GFS 10-meter wind RMSE at all lead times up to 48 hours, achieving 45% improvement at 1-hour lead time and 13% improvement at 48-hour lead time. Spatial analyses reveal the most persistent improvements along coastlines and shipping routes, where observations are most abundant. The tokenized architecture naturally accommodates heterogeneous observing platforms (ships, buoys, tide gauges, and coastal stations) and produces both site-specific predictions and basin-scale gridded products in a single forward pass. These results demonstrate a practical, low-latency post-processing approach that complements NWP by learning to correct systematic forecast errors.

CLJan 31, 2024Code
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data

Chenhui Zhang, Sherrie Wang

Large Vision-Language Models (VLMs) have demonstrated impressive performance on complex tasks involving visual input with natural language instructions. However, it remains unclear to what extent capabilities on natural images transfer to Earth observation (EO) data, which are predominantly satellite and aerial images less common in VLM training data. In this work, we propose a comprehensive benchmark to gauge the progress of VLMs toward being useful tools for EO data by assessing their abilities on scene understanding, localization and counting, and change detection tasks. Motivated by real-world applications, our benchmark includes scenarios like urban monitoring, disaster relief, land use, and conservation. We discover that, although state-of-the-art VLMs like GPT-4V possess extensive world knowledge that leads to strong performance on open-ended tasks like location understanding and image captioning, their poor spatial reasoning limits usefulness on object localization and counting tasks. Our benchmark will be made publicly available at https://vleo.danielz.ch/ and on Hugging Face at https://huggingface.co/collections/mit-ei/vleo-benchmark-datasets-65b789b0466555489cce0d70 for easy model evaluation.

CVSep 12, 2023
Combining Deep Learning and Street View Imagery to Map Smallholder Crop Types

Jordi Laguarta Soler, Thomas Friedel, Sherrie Wang

Accurate crop type maps are an essential source of information for monitoring yield progress at scale, projecting global crop production, and planning effective policies. To date, however, crop type maps remain challenging to create in low and middle-income countries due to a lack of ground truth labels for training machine learning models. Field surveys are the gold standard in terms of accuracy but require an often-prohibitively large amount of time, money, and statistical capacity. In recent years, street-level imagery, such as Google Street View, KartaView, and Mapillary, has become available around the world. Such imagery contains rich information about crop types grown at particular locations and times. In this work, we develop an automated system to generate crop type ground references using deep learning and Google Street View imagery. The method efficiently curates a set of street view images containing crop fields, trains a model to predict crop type by utilizing weakly-labelled images from disparate out-of-domain sources, and combines predicted labels with remote sensing time series to create a wall-to-wall crop type map. We show that, in Thailand, the resulting country-wide map of rice, cassava, maize, and sugarcane achieves an accuracy of 93%. We publicly release the first-ever crop type map for all of Thailand 2022 at 10m-resolution with no gaps. To our knowledge, this is the first time a 10m-resolution, multi-crop map has been created for any smallholder country. As the availability of roadside imagery expands, our pipeline provides a way to map crop types at scale around the globe, especially in underserved smallholder regions.

CVFeb 10
Conformal Prediction Sets for Instance Segmentation

Kerri Lu, Dan M. Kluger, Stephen Bates et al.

Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image and a pixel coordinate query, our algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high Intersection-Over-Union (IoU) with the true object instance mask. We apply our algorithm to instance segmentation examples in agricultural field delineation, cell segmentation, and vehicle detection. Empirically, we find that our prediction sets vary in size based on query difficulty and attain the target coverage, outperforming existing baselines such as Learn Then Test, Conformal Risk Control, and morphological dilation-based methods. We provide versions of the algorithm with asymptotic and finite sample guarantees.

LGFeb 26
Partial recovery of meter-scale surface weather

Jonathan Giezendanner, Qidong Yang, Eric Schmitt et al.

Near-surface atmospheric conditions can differ sharply over tens to hundreds of meters due to land cover and topography, yet this variability is absent from current weather analyses and forecasts. It is unclear whether such meter-scale variability reflects irreducibly chaotic dynamics or contains a component predictable from surface characteristics and large-scale atmospheric forcing. Here we show that a substantial, physically coherent component of meter-scale near-surface weather is statistically recoverable from existing observations. By conditioning coarse atmospheric state on sparse surface station measurements and high-resolution Earth observation data, we infer spatially continuous fields of near-surface wind, temperature, and humidity at 10 m resolution across the contiguous United States. Relative to ERA5, the inferred fields reduce wind error by 29% and temperature and dewpoint error by 6%, while explaining substantially more spatial variance at fixed time steps. They also exhibit physically interpretable structure, including urban heat islands, evapotranspiration-driven humidity contrasts, and wind speed differences across land cover types. Our findings expand the frontier of weather modeling by demonstrating a computationally feasible approach to continental-scale meter-resolution inference. More broadly, they illustrate how conditioning coarse dynamical models on static fine-scale features can reveal previously unresolved components of the Earth system.

MLNov 12, 2025
Masked Mineral Modeling: Continent-Scale Mineral Prospecting via Geospatial Infilling

Sujay Nair, Evan Coleman, Sherrie Wang et al.

Minerals play a critical role in the advanced energy technologies necessary for decarbonization, but characterizing mineral deposits hidden underground remains costly and challenging. Inspired by recent progress in generative modeling, we develop a learning method which infers the locations of minerals by masking and infilling geospatial maps of resource availability. We demonstrate this technique using mineral data for the conterminous United States, and train performant models, with the best achieving Dice coefficients of $0.31 \pm 0.01$ and recalls of $0.22 \pm 0.02$ on test data at 1$\times$1 mi$^2$ spatial resolution. One major advantage of our approach is that it can easily incorporate auxiliary data sources for prediction which may be more abundant than mineral data. We highlight the capabilities of our model by adding input layers derived from geophysical sources, along with a nation-wide ground survey of soils originally intended for agronomic purposes. We find that employing such auxiliary features can improve inference performance, while also enabling model evaluation in regions with no recorded minerals.

LGJan 29
Conformal Prediction for Generative Models via Adaptive Cluster-Based Density Estimation

Qidong Yang, Qianyu Julie Zhu, Jonathan Giezendanner et al.

Conditional generative models map input variables to complex, high-dimensional distributions, enabling realistic sample generation in a diverse set of domains. A critical challenge with these models is the absence of calibrated uncertainty, which undermines trust in individual outputs for high-stakes applications. To address this issue, we propose a systematic conformal prediction approach tailored to conditional generative models, leveraging density estimation on model-generated samples. We introduce a novel method called CP4Gen, which utilizes clustering-based density estimation to construct prediction sets that are less sensitive to outliers, more interpretable, and of lower structural complexity than existing methods. Extensive experiments on synthetic datasets and real-world applications, including climate emulation tasks, demonstrate that CP4Gen consistently achieves superior performance in terms of prediction set volume and structural simplicity. Our approach offers practitioners a powerful tool for uncertainty estimation associated with conditional generative models, particularly in scenarios demanding rigorous and interpretable prediction sets.

LGOct 16, 2024
Local Off-Grid Weather Forecasting with Multi-Modal Earth Observation Data

Qidong Yang, Jonathan Giezendanner, Daniel Salles Civitarese et al.

Urgent applications like wildfire management and renewable energy generation require precise, localized weather forecasts near the Earth's surface. However, forecasts produced by machine learning models or numerical weather prediction systems are typically generated on large-scale regular grids, where direct downscaling fails to capture fine-grained, near-surface weather patterns. In this work, we propose a multi-modal transformer model trained end-to-end to downscale gridded forecasts to off-grid locations of interest. Our model directly combines local historical weather observations (e.g., wind, temperature, dewpoint) with gridded forecasts to produce locally accurate predictions at various lead times. Multiple data modalities are collected and concatenated at station-level locations, treated as a token at each station. Using self-attention, the token corresponding to the target location aggregates information from its neighboring tokens. Experiments using weather stations across the Northeastern United States show that our model outperforms a range of data-driven and non-data-driven off-grid forecasting methods. They also reveal that direct input of station data provides a phase shift in local weather forecasting accuracy, reducing the prediction error by up to 80% compared to pure gridded data based models. This approach demonstrates how to bridge the gap between large-scale weather models and locally accurate forecasts to support high-stakes, location-sensitive decision-making.

CVDec 12, 2023
Taking it further: leveraging pseudo labels for field delineation across label-scarce smallholder regions

Philippe Rufin, Sherrie Wang, Sá Nogueira Lisboa et al.

Transfer learning allows for resource-efficient geographic transfer of pre-trained field delineation models. However, the scarcity of labeled data for complex and dynamic smallholder landscapes, particularly in Sub-Saharan Africa, remains a major bottleneck for large-area field delineation. This study explores opportunities of using sparse field delineation pseudo labels for fine-tuning models across geographies and sensor characteristics. We build on a FracTAL ResUNet trained for crop field delineation in India (median field size of 0.24 ha) and use this pre-trained model to generate pseudo labels in Mozambique (median field size of 0.06 ha). We designed multiple pseudo label selection strategies and compared the quantities, area properties, seasonal distribution, and spatial agreement of the pseudo labels against human-annotated training labels (n = 1,512). We then used the human-annotated labels and the pseudo labels for model fine-tuning and compared predictions against human field annotations (n = 2,199). Our results indicate i) a good baseline performance of the pre-trained model in both field delineation and field size estimation, and ii) the added value of regional fine-tuning with performance improvements in nearly all experiments. Moreover, we found iii) substantial performance increases when using only pseudo labels (up to 77% of the IoU increases and 68% of the RMSE decreases obtained by human labels), and iv) additional performance increases when complementing human annotations with pseudo labels. Pseudo labels can be efficiently generated at scale and thus facilitate domain adaptation in label-scarce settings. The workflow presented here is a stepping stone for overcoming the persisting data gaps in heterogeneous smallholder agriculture of Sub-Saharan Africa, where labels are commonly scarce.

MEJan 30, 2025
Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling

Dan M. Kluger, Kerri Lu, Tijana Zrnic et al.

Machine learning models are increasingly used to produce predictions that serve as input data in subsequent statistical analyses. For example, computer vision predictions of economic and environmental indicators based on satellite imagery are used in downstream regressions; similarly, language models are widely used to approximate human ratings and opinions in social science research. However, failure to properly account for errors in the machine learning predictions renders standard statistical procedures invalid. Prior work uses what we call the Predict-Then-Debias estimator to give valid confidence intervals when machine learning algorithms impute missing variables, assuming a small complete sample from the population of interest. We expand the scope by introducing bootstrap confidence intervals that apply when the complete data is a nonuniform (i.e., weighted, stratified, or clustered) sample and to settings where an arbitrary subset of features is imputed. Importantly, the method can be applied to many settings without requiring additional calculations. We prove that these confidence intervals are valid under no assumptions on the quality of the machine learning model and are no wider than the intervals obtained by methods that do not use machine learning predictions.

LGSep 3, 2025
Invariant Features for Global Crop Type Classification

Xin-Yi Tong, Sherrie Wang

Accurately obtaining crop type and its spatial distribution at a global scale is critical for food security, agricultural policy-making, and sustainable development. Remote sensing offers an efficient solution for large-scale crop classification, but the limited availability of reliable ground samples in many regions constrains applicability across geographic areas. To address performance declines under geospatial shifts, this study identifies remote sensing features that are invariant to geographic variation and proposes strategies to enhance cross-regional generalization. We construct CropGlobe, a global crop type dataset with 300,000 pixel-level samples from eight countries across five continents, covering six major food and industrial crops (corn, soybeans, rice, wheat, sugarcane, cotton). With broad geographic coverage, CropGlobe enables a systematic evaluation under cross-country, cross-continent, and cross-hemisphere transfer. We compare the transferability of temporal multi-spectral features (Sentinel-2-based 1D/2D median features and harmonic coefficients) and hyperspectral features (from EMIT). To improve generalization under spectral and phenological shifts, we design CropNet, a lightweight and robust CNN tailored for pixel-level crop classification, coupled with temporal data augmentation (time shift, time scale, and magnitude warping) that simulates realistic cross-regional phenology. Experiments show that 2D median temporal features from Sentinel-2 consistently exhibit the strongest invariance across all transfer scenarios, and augmentation further improves robustness, particularly when training data diversity is limited. Overall, the work identifies more invariant feature representations that enhance geographic transferability and suggests a promising path toward scalable, low-cost crop type applications across globally diverse regions.

CVJan 13, 2022
Unlocking large-scale crop field delineation in smallholder farming systems with transfer learning and weak supervision

Sherrie Wang, Francois Waldner, David B. Lobell

Crop field boundaries aid in mapping crop types, predicting yields, and delivering field-scale analytics to farmers. Recent years have seen the successful application of deep learning to delineating field boundaries in industrial agricultural systems, but field boundary datasets remain missing in smallholder systems due to (1) small fields that require high resolution satellite imagery to delineate and (2) a lack of ground labels for model training and validation. In this work, we combine transfer learning and weak supervision to overcome these challenges, and we demonstrate the methods' success in India where we efficiently generated 10,000 new field labels. Our best model uses 1.5m resolution Airbus SPOT imagery as input, pre-trains a state-of-the-art neural network on France field boundaries, and fine-tunes on India labels to achieve a median Intersection over Union (IoU) of 0.86 in India. If using 4.8m resolution PlanetScope imagery instead, the best model achieves a median IoU of 0.72. Experiments also show that pre-training in France reduces the number of India field labels needed to achieve a given performance level by as much as $20\times$ when datasets are small. These findings suggest our method is a scalable approach for delineating crop fields in regions of the world that currently lack field boundary datasets. We publicly release the 10,000 labels and delineation model to facilitate the creation of field boundary maps and new methods by the community.

LGNov 8, 2021
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

Christopher Yeh, Chenlin Meng, Sherrie Wang et al.

Progress toward the United Nations Sustainable Development Goals (SDGs) has been hindered by a lack of data on key environmental and socioeconomic indicators, which historically have come from ground surveys with sparse temporal and spatial coverage. Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media, to provide insights into progress toward SDGs. Despite promising early results, approaches to using such data for SDG measurement thus far have largely evaluated on different datasets or used inconsistent evaluation metrics, making it hard to understand whether performance is improving and where additional research would be most fruitful. Furthermore, processing satellite and ground survey data requires domain knowledge that many in the machine learning community lack. In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs, including tasks related to economic development, agriculture, health, education, water and sanitation, climate action, and life on land. Datasets for 11 of the 15 tasks are released publicly for the first time. Our goals for SustainBench are to (1) lower the barriers to entry for the machine learning community to contribute to measuring and achieving the SDGs; (2) provide standard benchmarks for evaluating machine learning models on tasks across a variety of SDGs; and (3) encourage the development of novel machine learning methods where improved model performance facilitates progress towards the SDGs.

IVSep 10, 2021
Combining GEDI and Sentinel-2 for wall-to-wall mapping of tall and short crops

Stefania Di Tommaso, Sherrie Wang, David B. Lobell

High resolution crop type maps are an important tool for improving food security, and remote sensing is increasingly used to create such maps in regions that possess ground truth labels for model training. However, these labels are absent in many regions, and models trained in other regions on typical satellite features, such as those from optical sensors, often exhibit low performance when transferred. Here we explore the use of NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, combined with Sentinel-2 optical data, for crop type mapping. Using data from three major cropped regions (in China, France, and the United States) we first demonstrate that GEDI energy profiles are capable of reliably distinguishing maize, a crop typically above 2m in height, from crops like rice and soybean that are shorter. We further show that these GEDI profiles provide much more invariant features across geographies compared to spectral and phenological features detected by passive optical sensors. GEDI is able to distinguish maize from other crops within each region with accuracies higher than 84%, and able to transfer across regions with accuracies higher than 82% compared to 64% for transfer of optical features. Finally, we show that GEDI profiles can be used to generate training labels for models based on optical imagery from Sentinel-2, thereby enabling the creation of 10m wall-to-wall maps of tall versus short crops in label-scarce regions. As maize is the second most widely grown crop in the world and often the only tall crop grown within a landscape, we conclude that GEDI offers great promise for improving global crop type maps.

APSep 2, 2021
Two Shifts for Crop Mapping: Leveraging Aggregate Crop Statistics to Improve Satellite-based Maps in New Regions

Dan M. Kluger, Sherrie Wang, David B. Lobell

Crop type mapping at the field level is critical for a variety of applications in agricultural monitoring, and satellite imagery is becoming an increasingly abundant and useful raw input from which to create crop type maps. Still, in many regions crop type mapping with satellite data remains constrained by a scarcity of field-level crop labels for training supervised classification models. When training data is not available in one region, classifiers trained in similar regions can be transferred, but shifts in the distribution of crop types as well as transformations of the features between regions lead to reduced classification accuracy. We present a methodology that uses aggregate-level crop statistics to correct the classifier by accounting for these two types of shifts. To adjust for shifts in the crop type composition we present a scheme for properly reweighting the posterior probabilities of each class that are output by the classifier. To adjust for shifts in features we propose a method to estimate and remove linear shifts in the mean feature vector. We demonstrate that this methodology leads to substantial improvements in overall classification accuracy when using Linear Discriminant Analysis (LDA) to map crop types in Occitanie, France and in Western Province, Kenya. When using LDA as our base classifier, we found that in France our methodology led to percent reductions in misclassifications ranging from 2.8% to 42.2% (mean = 21.9%) over eleven different training departments, and in Kenya the percent reductions in misclassification were 6.6%, 28.4%, and 42.7% for three training regions. While our methodology was statistically motivated by the LDA classifier, it can be applied to any type of classifier. As an example, we demonstrate its successful application to improve a Random Forest classifier.

LGApr 28, 2020
Meta-Learning for Few-Shot Land Cover Classification

Marc Rußwurm, Sherrie Wang, Marco Körner et al.

The representations of the Earth's surface vary from one geographic region to another. For instance, the appearance of urban areas differs between continents, and seasonality influences the appearance of vegetation. To capture the diversity within a single category, like as urban or vegetation, requires a large model capacity and, consequently, large datasets. In this work, we propose a different perspective and view this diversity as an inductive transfer learning problem where few data samples from one region allow a model to adapt to an unseen region. We evaluate the model-agnostic meta-learning (MAML) algorithm on classification and segmentation tasks using globally and regionally distributed datasets. We find that few-shot model adaptation outperforms pre-training with regular gradient descent and fine-tuning on (1) the Sen12MS dataset and (2) DeepGlobe data when the source domain and target domain differ. This indicates that model optimization with meta-learning may benefit tasks in the Earth sciences whose data show a high degree of diversity from region to region, while traditional gradient-based supervised learning remains suitable in the absence of a feature or label shift.

CVMay 8, 2018
Tile2Vec: Unsupervised representation learning for spatially distributed data

Neal Jean, Sherrie Wang, Anshul Samar et al.

Geospatial analysis lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language -- words appearing in similar contexts tend to have similar meanings -- to spatially distributed data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations on three datasets. Our learned representations significantly improve performance in downstream classification tasks and, similar to word vectors, visual analogies can be obtained via simple arithmetic in the latent space.