Sara Si-Moussi

CV
h-index112
4papers
2citations
Novelty38%
AI Score38

4 Papers

CVNov 26, 2025
BotaCLIP: Contrastive Learning for Botany-Aware Representation of Earth Observation Data

Selene Cerna, Sara Si-Moussi, Wilfried Thuiller et al.

Foundation models have demonstrated a remarkable ability to learn rich, transferable representations across diverse modalities such as images, text, and audio. In modern machine learning pipelines, these representations often replace raw data as the primary input for downstream tasks. In this paper, we address the challenge of adapting a pre-trained foundation model to inject domain-specific knowledge, without retraining from scratch or incurring significant computational costs. To this end, we introduce BotaCLIP, a lightweight multimodal contrastive framework that adapts a pre-trained Earth Observation foundation model (DOFA) by aligning high-resolution aerial imagery with botanical relevés. Unlike generic embeddings, BotaCLIP internalizes ecological structure through contrastive learning with a regularization strategy that mitigates catastrophic forgetting. Once trained, the resulting embeddings serve as transferable representations for downstream predictors. Motivated by real-world applications in biodiversity modeling, we evaluated BotaCLIP representations in three ecological tasks: plant presence prediction, butterfly occurrence modeling, and soil trophic group abundance estimation. The results showed consistent improvements over those derived from DOFA and supervised baselines. More broadly, this work illustrates how domain-aware adaptation of foundation models can inject expert knowledge into data-scarce settings, enabling frugal representation learning.

APJun 16, 2025
EUNIS Habitat Maps: Enhancing Thematic and Spatial Resolution for Europe through Machine Learning

Sara Si-Moussi, Stephan Hennekens, Sander Mücher et al.

The EUNIS habitat classification is crucial for categorising European habitats, supporting European policy on nature conservation and implementing the Nature Restoration Law. To meet the growing demand for detailed and accurate habitat information, we provide spatial predictions for 260 EUNIS habitat types at hierarchical level 3, together with independent validation and uncertainty analyses. Using ensemble machine learning models, together with high-resolution satellite imagery and ecologically meaningful climatic, topographic and edaphic variables, we produced a European habitat map indicating the most probable EUNIS habitat at 100-m resolution across Europe. Additionally, we provide information on prediction uncertainty and the most probable habitats at level 3 within each EUNIS level 1 formation. This product is particularly useful for both conservation and restoration purposes. Predictions were cross-validated at European scale using a spatial block cross-validation and evaluated against independent data from France (forests only), the Netherlands and Austria. The habitat maps obtained strong predictive performances on the validation datasets with distinct trade-offs in terms of recall and precision across habitat formations.

LGJul 13, 2025
Continental-scale habitat distribution modelling with multimodal earth observation foundation models

Sara Si-Moussi, Stephan Hennekens, Sander Mucher et al.

Habitats integrate the abiotic conditions, vegetation composition and structure that support biodiversity and sustain nature's contributions to people. Most habitats face mounting pressures from human activities, which requires accurate, high-resolution habitat mapping for effective conservation and restoration. Yet, current habitat maps often fall short in thematic or spatial resolution because they must (1) model several mutually exclusive habitat types that co-occur across landscapes and (2) cope with severe class imbalance that complicates exhaustive multi-class training. Here, we evaluated how high-resolution remote sensing (RS) data and Artificial Intelligence (AI) tools can improve habitat mapping across large geographical extents at fine spatial and thematic resolution. Using vegetation plots from the European Vegetation Archive, we modelled the distribution of Level 3 EUNIS habitat types across Europe and assessed multiple modelling strategies against independent validation datasets. Strategies that exploited the hierarchical nature of habitat classifications resolved classification ambiguities, especially in fragmented habitats. Integrating satellite-borne multispectral and radar imagery, particularly through Earth Observation (EO) Foundation models (EO-FMs), enhanced within-formation discrimination and overall performance. Finally, ensemble machine learning that corrects class imbalance boosted predictive accuracy even further. Our methodological framework is transferable beyond Europe and adaptable to other classification systems. Future research should advance temporal modelling of habitat dynamics, extend to habitat segmentation and quality assessment, and exploit next-generation EO data paired with higher-quality in situ observations.

MLJul 12, 2025
Uncovering symmetric and asymmetric species associations from community and environmental data

Sara Si-Moussi, Esther Galbrun, Mickael Hedde et al.

There is no much doubt that biotic interactions shape community assembly and ultimately the spatial co-variations between species. There is a hope that the signal of these biotic interactions can be observed and retrieved by investigating the spatial associations between species while accounting for the direct effects of the environment. By definition, biotic interactions can be both symmetric and asymmetric. Yet, most models that attempt to retrieve species associations from co-occurrence or co-abundance data internally assume symmetric relationships between species. Here, we propose and validate a machine-learning framework able to retrieve bidirectional associations by analyzing species community and environmental data. Our framework (1) models pairwise species associations as directed influences from a source to a target species, parameterized with two species-specific latent embeddings: the effect of the source species on the community, and the response of the target species to the community; and (2) jointly fits these associations within a multi-species conditional generative model with different modes of interactions between environmental drivers and biotic associations. Using both simulated and empirical data, we demonstrate the ability of our framework to recover known asymmetric and symmetric associations and highlight the properties of the learned association networks. By comparing our approach to other existing models such as joint species distribution models and probabilistic graphical models, we show its superior capacity at retrieving symmetric and asymmetric interactions. The framework is intuitive, modular and broadly applicable across various taxonomic groups.