Teja Kattenborn

h-index33

5papers

54citations

Novelty36%

AI Score44

Ranked #48,486 of 194,257 authors (top 25%)#16,955 in CV (top 29%)

5 Papers

11.6CVNov 1, 2023Code

OpenForest: A data catalogue for machine learning in forest monitoring

Arthur Ouaknine, Teja Kattenborn, Etienne Laliberté et al.

Forests play a crucial role in Earth's system processes and provide a suite of social and economic ecosystem services, but are significantly impacted by human activities, leading to a pronounced disruption of the equilibrium within ecosystems. Advancing forest monitoring worldwide offers advantages in mitigating human impacts and enhancing our comprehension of forest composition, alongside the effects of climate change. While statistical modeling has traditionally found applications in forest biology, recent strides in machine learning and computer vision have reached important milestones using remote sensing data, such as tree species identification, tree crown segmentation and forest biomass assessments. For this, the significance of open access data remains essential in enhancing such data-driven algorithms and methodologies. Here, we provide a comprehensive and extensive overview of 86 open access forest datasets across spatial scales, encompassing inventories, ground-based, aerial-based, satellite-based recordings, and country or world maps. These datasets are grouped in OpenForest, a dynamic catalogue open to contributions that strives to reference all available open access forest datasets. Moreover, in the context of these datasets, we aim to inspire research in machine learning applied to forest biology by establishing connections between contemporary topics, perspectives and challenges inherent in both domains. We hope to encourage collaborations among scientists, fostering the sharing and exploration of diverse datasets through the application of machine learning methods for large-scale forest monitoring. OpenForest is available at https://github.com/RolnickLab/OpenForest .

5.2CVAug 5, 2024

Earth System Data Cubes: Avenues for advancing Earth system research

David Montero, Guido Kraemer, Anca Anghelea et al.

Recent advancements in Earth system science have been marked by the exponential increase in the availability of diverse, multivariate datasets characterised by moderate to high spatio-temporal resolutions. Earth System Data Cubes (ESDCs) have emerged as one suitable solution for transforming this flood of data into a simple yet robust data structure. ESDCs achieve this by organising data into an analysis-ready format aligned with a spatio-temporal grid, facilitating user-friendly analysis and diminishing the need for extensive technical data processing knowledge. Despite these significant benefits, the completion of the entire ESDC life cycle remains a challenging task. Obstacles are not only of a technical nature but also relate to domain-specific problems in Earth system research. There exist barriers to realising the full potential of data collections in light of novel cloud-based technologies, particularly in curating data tailored for specific application domains. These include transforming data to conform to a spatio-temporal grid with minimum distortions and managing complexities such as spatio-temporal autocorrelation issues. Addressing these challenges is pivotal for the effective application of Artificial Intelligence (AI) approaches. Furthermore, adhering to open science principles for data dissemination, reproducibility, visualisation, and reuse is crucial for fostering sustainable research. Overcoming these challenges offers a substantial opportunity to advance data-driven Earth system research, unlocking the full potential of an integrated, multidimensional view of Earth system processes. This is particularly true when such research is coupled with innovative research paradigms and technological progress.

10.2CVJul 9, 2025Code

GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction

Eya Cherif, Arthur Ouaknine, Luke A. Brown et al. · mila

Plant traits such as leaf carbon content and leaf mass are essential variables in the study of biodiversity and climate change. However, conventional field sampling cannot feasibly cover trait variation at ecologically meaningful spatial scales. Machine learning represents a valuable solution for plant trait prediction across ecosystems, leveraging hyperspectral data from remote sensing. Nevertheless, trait prediction from hyperspectral data is challenged by label scarcity and substantial domain shifts (\eg across sensors, ecological distributions), requiring robust cross-domain methods. Here, we present GreenHyperSpectra, a pretraining dataset encompassing real-world cross-sensor and cross-ecosystem samples designed to benchmark trait prediction with semi- and self-supervised methods. We adopt an evaluation framework encompassing in-distribution and out-of-distribution scenarios. We successfully leverage GreenHyperSpectra to pretrain label-efficient multi-output regression models that outperform the state-of-the-art supervised baseline. Our empirical analyses demonstrate substantial improvements in learning spectral representations for trait prediction, establishing a comprehensive methodological framework to catalyze research at the intersection of representation learning and plant functional traits assessment. All code and data are available at: https://github.com/echerif18/HyspectraSSL.

3.6CVNov 10, 2025

PlantTraitNet: An Uncertainty-Aware Multimodal Framework for Global-Scale Plant Trait Inference from Citizen Science Data

Ayushi Sharma, Johanna Trost, Daniel Lusk et al.

Global plant maps of plant traits, such as leaf nitrogen or plant height, are essential for understanding ecosystem processes, including the carbon and energy cycles of the Earth system. However, existing trait maps remain limited by the high cost and sparse geographic coverage of field-based measurements. Citizen science initiatives offer a largely untapped resource to overcome these limitations, with over 50 million geotagged plant photographs worldwide capturing valuable visual information on plant morphology and physiology. In this study, we introduce PlantTraitNet, a multi-modal, multi-task uncertainty-aware deep learning framework that predictsfour key plant traits (plant height, leaf area, specific leaf area, and nitrogen content) from citizen science photos using weak supervision. By aggregating individual trait predictions across space, we generate global maps of trait distributions. We validate these maps against independent vegetation survey data (sPlotOpen) and benchmark them against leading global trait products. Our results show that PlantTraitNet consistently outperforms existing trait maps across all evaluated traits, demonstrating that citizen science imagery, when integrated with computer vision and geospatial AI, enables not only scalable but also more accurate global trait mapping. This approach offers a powerful new pathway for ecological research and Earth system modeling.

6.5CVOct 17, 2024Code

DivShift: Exploring Domain-Specific Distribution Shifts in Large-Scale, Volunteer-Collected Biodiversity Datasets

Elena Sierra, Lauren E. Gillespie, Salim Soltani et al.

Large-scale, volunteer-collected datasets of community-identified natural world imagery like iNaturalist have enabled marked performance gains for fine-grained visual classification of species using machine learning methods. However, such data -- sometimes referred to as citizen science data -- are opportunistic and lack a structured sampling strategy. This volunteer-collected biodiversity data contains geographic, temporal, taxonomic, observers, and sociopolitical biases that can have significant effects on biodiversity model performance, but whose impacts are unclear for fine-grained species recognition performance. Here we introduce Diversity Shift (DivShift), a framework for quantifying the effects of domain-specific distribution shifts on machine learning model performance. To diagnose the performance effects of biases specific to volunteer-collected biodiversity data, we also introduce DivShift - North American West Coast (DivShift-NAWC), a curated dataset of almost 7.5 million iNaturalist images across the western coast of North America partitioned across five types of expert-verified bias. We compare species recognition performance across these bias partitions using a diverse variety of species- and ecosystem-focused accuracy metrics. We observe that these biases confound model performance less than expected from the underlying label distribution shift, and that more data leads to better model performance but the magnitude of these improvements are bias-specific. These findings imply that while the structure within natural world images provides generalization improvements for biodiversity monitoring tasks, the biases present in volunteer-collected biodiversity data can also affect model performance; thus these models should be used with caution in downstream biodiversity monitoring tasks.