79.5LGApr 12Code
TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and AnalysisZhengpeng Feng, Clement Atzberger, Sadiq Jaffer et al.
Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR - principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. All code and data are available at: https://github.com/ucam-eo/tessera.
SYOct 5, 2017
On the Interaction between Personal Comfort Systems and Centralized HVAC Systems in Office BuildingsRachel Kalaimani, Milan Jain, Srinivasan Keshav et al.
Most modern HVAC systems suffer from two intrinsic problems. First, inability to meet diverse comfort requirements of the occupants. Second, heat or cool an entire zone even when the zone is only partially occupied. Both issues can be mitigated by using personal comfort systems (PCS) which bridge the comfort gap between what is provided by a central HVAC system and the personal preferences of the occupants. In recent work, we have proposed and deployed such a system, called SPOT. We address the question, "How should an existing HVAC system modify its operation to benefit the availability of PCS like SPOT?" For example, energy consumption could be reduced during sparse occupancy by choosing appropriate thermal set backs, with the PCS providing the additional offset in thermal comfort required for each occupant. Our control strategy based on Model Predictive Control (MPC), employs a bi-linear thermal model, and has two time-scales to accommodate the physical constraints that limit certain components of the central HVAC system from frequently changing their set points. We compare the energy consumption and comfort offered by our SPOT-aware HVAC system with that of a state-of-the-art MPC-based central HVAC system in multiple settings including different room layouts and partial deployment of PCS. Numerical evaluations show that our system obtains, in average, 45% (15%) savings in energy in summer (winter), compared with the benchmark system for the case of homogeneous comfort requirements. For heterogeneous comfort requirements, we observe 51% (29%) improvement in comfort in summer (winter) in addition to significant savings in energy.
SYOct 24, 2018
Using Personal Environmental Comfort Systems to Mitigate the Impact of Occupancy Prediction Errors on HVAC PerformanceMilan Jain, Rachel K Kalaimani, Srinivasan Keshav et al.
Heating, Ventilation and Air Conditioning (HVAC) consumes a significant fraction of energy in commercial buildings. Hence, the use of optimization techniques to reduce HVAC energy consumption has been widely studied. Model predictive control (MPC) is one state of the art optimization technique for HVAC control which converts the control problem to a sequence of optimization problems, each over a finite time horizon. In a typical MPC, future system state is estimated from a model using predictions of model inputs, such as building occupancy and outside air temperature. Consequently, as prediction accuracy deteriorates, MPC performance--in terms of occupant comfort and building energy use--degrades. In this work, we use a custom-built building thermal simulator to systematically investigate the impact of occupancy prediction errors on occupant comfort and energy consumption. Our analysis shows that in our test building, as occupancy prediction error increases from 5\% to 20\% the performance of an MPC-based HVAC controller becomes worse than that of even a simple static schedule. However, when combined with a personal environmental control (PEC) system, HVAC controllers are considerably more robust to prediction errors. Thus, we quantify the effectiveness of PECs in mitigating the impact of forecast errors on MPC control for HVAC systems.
73.2LGApr 10
Below-ground Fungal Biodiversity Can be Monitored Using Self-Supervised Learning Satellite FeaturesRobin Young, Michael E. Van Nuland, E. Toby Kiers et al.
Mycorrhizal fungi are vital to terrestrial ecosystem functioning. Yet monitoring their biodiversity at landscape scales is often unfeasible due to time and cost constraints. Current predictions suggest that 90\% of mycorrhizal diversity hotspots remain unprotected, opening questions of how to broadly and effectively map underground fungal communities. Here, we show that self-supervised learning (SSL) applied to satellite imagery can predict below-ground ectomycorrhizal fungal richness across diverse environments. Our models explain over half the variance in species richness across ~12,000 field samples spanning Europe and Asia. SSL-derived features prove to be the single most informative predictor, subsuming the majority of information contained in climate, soil, and land cover datasets. Using this approach, we achieve a 10,000-fold increase in spatial resolution over existing techniques, moving from 1km landscape averages to 10m habitat-scale observations with nearly no systematic bias. As satellite observations are dynamic rather than static, this enables temporal monitoring of below-ground biodiversity at landscape scales for the first time. We analyze multi-year trends in predicted fungal richness across UK National Park woodlands, finding that ancient forests may be losing ectomycorrhizal diversity at disproportionate rates. These results establish SSL satellite features as a scalable tool for extending sparse field observations to continuous, high-resolution biodiversity maps for monitoring the invisible half of terrestrial ecosystems.
AIDec 27, 2022
Proceedings of AAAI 2022 Fall Symposium: The Role of AI in Responding to Climate ChallengesFeras A. Batarseh, Priya L. Donti, Ján Drgoňa et al.
Climate change is one of the most pressing challenges of our time, requiring rapid action across society. As artificial intelligence tools (AI) are rapidly deployed, it is therefore crucial to understand how they will impact climate action. On the one hand, AI can support applications in climate change mitigation (reducing or preventing greenhouse gas emissions), adaptation (preparing for the effects of a changing climate), and climate science. These applications have implications in areas ranging as widely as energy, agriculture, and finance. At the same time, AI is used in many ways that hinder climate action (e.g., by accelerating the use of greenhouse gas-emitting fossil fuels). In addition, AI technologies have a carbon and energy footprint themselves. This symposium brought together participants from across academia, industry, government, and civil society to explore these intersections of AI with climate change, as well as how each of these sectors can contribute to solutions.
CVSep 14, 2025Code
Scaling Up Forest Vision with Synthetic DataYihang She, Andrew Blake, David Coomes et al.
Accurate tree segmentation is a key step in extracting individual tree metrics from forest laser scans, and is essential to understanding ecosystem functions in carbon cycling and beyond. Over the past decade, tree segmentation algorithms have advanced rapidly due to developments in AI. However existing, public, 3D forest datasets are not large enough to build robust tree segmentation systems. Motivated by the success of synthetic data in other domains such as self-driving, we investigate whether similar approaches can help with tree segmentation. In place of expensive field data collection and annotation, we use synthetic data during pretraining, and then require only minimal, real forest plot annotation for fine-tuning. We have developed a new synthetic data generation pipeline to do this for forest vision tasks, integrating advances in game-engines with physics-based LiDAR simulation. As a result, we have produced a comprehensive, diverse, annotated 3D forest dataset on an unprecedented scale. Extensive experiments with a state-of-the-art tree segmentation algorithm and a popular real dataset show that our synthetic data can substantially reduce the need for labelled real data. After fine-tuning on just a single, real, forest plot of less than 0.1 hectare, the pretrained model achieves segmentations that are competitive with a model trained on the full scale real data. We have also identified critical factors for successful use of synthetic data: physics, diversity, and scale, paving the way for more robust 3D forest vision systems in the future. Our data generation pipeline and the resulting dataset are available at https://github.com/yihshe/CAMP3D.git.
LGMar 5, 2024Code
From Spectra to Biophysical Insights: End-to-End Learning with a Biased Radiative Transfer ModelYihang She, Clement Atzberger, Andrew Blake et al.
Advances in machine learning have boosted the use of Earth observation data for climate change research. Yet, the interpretability of machine-learned representations remains a challenge, particularly in understanding forests' biophysical reactions to climate change. Traditional methods in remote sensing that invert radiative transfer models (RTMs) to retrieve biophysical variables from spectral data often fail to account for biases inherent in the RTM, especially for complex forests. We propose to integrate RTMs into an auto-encoder architecture, creating an end-to-end learning approach. Our method not only corrects biases in RTMs but also outperforms traditional techniques for variable retrieval like neural network regression. Furthermore, our framework has potential generally for inverting biased physical models. The code is available on https://github.com/yihshe/ai-refined-rtm.git.
LGAug 20, 2025Code
DualNILM: Energy Injection Identification Enabled Disaggregation with Deep Multi-Task LearningXudong Wang, Guoming Tang, Junyu Xue et al.
Non-Intrusive Load Monitoring (NILM) offers a cost-effective method to obtain fine-grained appliance-level energy consumption in smart homes and building applications. However, the increasing adoption of behind-the-meter (BTM) energy sources such as solar panels and battery storage poses new challenges for conventional NILM methods that rely solely on at-the-meter data. The energy injected from the BTM sources can obscure the power signatures of individual appliances, leading to a significant decrease in NILM performance. To address this challenge, we present DualNILM, a deep multi-task learning framework designed for the dual tasks of appliance state recognition and injected energy identification. Using a Transformer-based architecture that integrates sequence-to-point and sequence-to-sequence strategies, DualNILM effectively captures multiscale temporal dependencies in the aggregate power consumption patterns, allowing for accurate appliance state recognition and energy injection identification. Extensive evaluation on self-collected and synthesized datasets demonstrates that DualNILM maintains an excellent performance for dual tasks in NILM, much outperforming conventional methods. Our work underscores the framework's potential for robust energy disaggregation in modern energy systems with renewable penetration. Synthetic photovoltaic augmented datasets with realistic injection simulation methodology will be open-sourced after review.
15.4LGApr 4
Spatiotemporal Interpolation of GEDI Biomass with Calibrated UncertaintyRobin Young, Srinivasan Keshav
Monitoring deforestation-driven carbon emissions requires both spatially explicit and temporally continuous estimates of aboveground biomass density (AGBD) with calibrated uncertainty. NASA's Global Ecosystem Dynamics Investigation (GEDI) provides reliable LIDAR-derived AGBD, but its orbital sampling causes irregular spatiotemporal coverage, and occasional operational interruptions, including a 13-month hibernation from March 2023 to April 2024, leave extended gaps in the observational record. Prior work has used machine learning approaches to fill GEDI's spatial gaps using satellite-derived features, but temporal interpolation of biomass through unobserved periods, particularly across active disturbance events, remains largely unaddressed. Moreover, standard ensemble methods for biomass mapping have been shown to produce systematically miscalibrated prediction intervals. To address these gaps, we extend the Attentive Neural Process (ANP) framework, previously applied to spatial biomass interpolation, to jointly sparse spatiotemporal settings using geospatial foundation model embeddings. We treat space and time symmetrically, empirically validating a form of space-for-time substitution in which observations from nearby locations at other times inform predictions at held-out periods. Our results demonstrate that the ANP produces well-calibrated uncertainty estimates across disturbance regimes, supporting its use in Measurement, Reporting, and Verification (MRV) applications that require reliable uncertainty quantification for forest carbon accounting.
LGJan 23
Embedding -based Crop Type Classification in the Groundnut Basin of SenegalMadeline C. Lisaius, Srinivasan Keshav, Andrew Blake et al.
Crop type maps from satellite remote sensing are important tools for food security, local livelihood support and climate change mitigation in smallholder regions of the world, but most satellite-based methods are not well suited to smallholder conditions. To address this gap, we establish a four-part criteria for a useful embedding-based approach consisting of 1) performance, 2) plausibility, 3) transferability and 4) accessibility and evaluate geospatial foundation model (FM) embeddings -based approaches using TESSERA and AlphaEarth against current baseline methods for a region in the groundnut basin of Senegal. We find that the TESSERA -based approach to land cover and crop type mapping fulfills the selection criteria best, and in one temporal transfer example shows 28% higher accuracy compared to the next best method. These results indicate that TESSERA embeddings are an effective approach for crop type classification and mapping tasks in Senegal.
LGJan 23
Interpolation of GEDI Biomass Estimates with Calibrated Uncertainty QuantificationRobin Young, Srinivasan Keshav
Reliable wall-to-wall biomass density estimation from NASA's GEDI mission requires interpolating sparse LIDAR observations across heterogeneous landscapes. While machine learning approaches like Random Forest and XGBoost are widely used, they treat spatial predictions of GEDI observations from multispectral or SAR remote sensing data as independent without adapting to the varying difficulty of heterogeneous landscapes. We demonstrate these approaches generally fail to produce calibrated prediction intervals. We show that this stems from conflating ensemble variance with aleatoric uncertainty and ignoring local spatial context. To resolve this, we introduce Attentive Neural Processes (ANPs), a probabilistic meta-learning architecture that explicitly conditions predictions on local observation sets and exploits geospatial foundation model embeddings. Unlike static ensembles, ANPs learn a flexible spatial covariance function, allowing estimates to be more uncertain in complex landscapes and less in homogeneous areas. We validate this approach across five distinct biomes ranging from tropical Amazonian forests to boreal, temperate, and alpine ecosystems, demonstrating that ANPs achieve competitive accuracy while maintaining near-ideal uncertainty calibration. We demonstrate the operational utility of the method through few-shot adaptation, where the model recovers most of the performance gap in cross-region transfer using minimal local data. This work provides a scalable, theoretically rigorous alternative to ensemble variance for continental scale earth observation.
DCApr 25, 2024
Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference WorkloadsGrant Wilkins, Srinivasan Keshav, Richard Mortier
Both the training and use of Large Language Models (LLMs) require large amounts of energy. Their increasing popularity, therefore, raises critical concerns regarding the energy efficiency and sustainability of data centers that host them. This paper addresses the challenge of reducing energy consumption in data centers running LLMs. We propose a hybrid data center model that uses a cost-based scheduling framework to dynamically allocate LLM tasks across hardware accelerators that differ in their energy efficiencies and computational capabilities. Specifically, our workload-aware strategy determines whether tasks are processed on energy-efficient processors or high-performance GPUs based on the number of input and output tokens in a query. Our analysis of a representative LLM dataset, finds that this hybrid strategy can reduce CPU+GPU energy consumption by 7.5% compared to a workload-unaware baseline.
LGJun 25, 2025
TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and AnalysisZhengpeng Feng, Clement Atzberger, Sadiq Jaffer et al.
Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. The model training/inference code, downstream task code, and pre-generated embeddings can be accessed at https://github.com/ucam-eo
CYAug 12, 2021
How Computer Science Can Aid Forest RestorationGemma Gordon, Amelia Holcomb, Tom Kelly et al.
The world faces two interlinked crises: climate change and loss of biodiversity. Forest restoration on degraded lands and surplus croplands can play a significant role both in sequestering carbon and re-establishing bio-diversity. There is a considerable body of research and practice that addresses forest restoration. However, there has been little work by computer scientists to bring powerful computational techniques to bear on this important area of work, perhaps due to a lack of awareness. In an attempt to bridge this gap, we present our vision of how techniques from computer science, broadly speaking, can aid current practice in forest restoration.
SIMay 25, 2021
Climate Action During COVID-19 Recovery and Beyond: A Twitter Text Mining StudyMohammad S. Parsa, Lukasz Golab, Srinivasan Keshav
The Coronavirus pandemic created a global crisis that prompted immediate large-scale action, including economic shutdowns and mobility restrictions. These actions have had devastating effects on the economy, but some positive effects on the environment. As the world recovers from the pandemic, we ask the following question: What is the public attitude towards climate action during COVID-19 recovery and beyond? We answer this question by analyzing discussions on the Twitter social media platform. We find that most discussions support climate action and point out lessons learned during pandemic response that can shape future climate policy, although skeptics continue to have a presence. Additionally, concerns arise in the context of climate action during the pandemic, such as mitigating the risk of COVID-19 transmission on public transit.