QMLGPEJan 3, 2024

On the selection and effectiveness of pseudo-absences for species distribution modeling with deep learning

arXiv:2401.02989v119 citationsh-index: 66Ecological Informatics
Originality Incremental advance
AI Analysis

This work addresses a specific issue in ecology and conservation for researchers using presence-only data, but it is incremental as it builds on existing pseudo-absence methods with neural network adaptations.

The paper tackles the problem of selecting and using pseudo-absences in multi-species distribution modeling with deep learning, which faces challenges like class imbalance and geographic biases, and reports improved results on a benchmark dataset from six regions compared to competing approaches.

Species distribution modeling is a highly versatile tool for understanding the intricate relationship between environmental conditions and species occurrences. However, the available data often lacks information on confirmed species absence and is limited to opportunistically sampled, presence-only observations. To overcome this limitation, a common approach is to employ pseudo-absences, which are specific geographic locations designated as negative samples. While pseudo-absences are well-established for single-species distribution models, their application in the context of multi-species neural networks remains underexplored. Notably, the significant class imbalance between species presences and pseudo-absences is often left unaddressed. Moreover, the existence of different types of pseudo-absences (e.g., random and target-group background points) adds complexity to the selection process. Determining the optimal combination of pseudo-absences types is difficult and depends on the characteristics of the data, particularly considering that certain types of pseudo-absences can be used to mitigate geographic biases. In this paper, we demonstrate that these challenges can be effectively tackled by integrating pseudo-absences in the training of multi-species neural networks through modifications to the loss function. This adjustment involves assigning different weights to the distinct terms of the loss function, thereby addressing both the class imbalance and the choice of pseudo-absence types. Additionally, we propose a strategy to set these loss weights using spatial block cross-validation with presence-only data. We evaluate our approach using a benchmark dataset containing independent presence-absence data from six different regions and report improved results when compared to competing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes