LGMLDec 26, 2024

Applying the maximum entropy principle to neural networks enhances multi-species distribution models

arXiv:2412.19217v3h-index: 20
Originality Highly original
AI Analysis

This work addresses the challenge of improving species distribution models for ecologists and conservationists by leveraging citizen science data, offering a novel hybrid approach that enhances prediction accuracy in biased datasets.

The authors tackled the problem of modeling species distributions from presence-only data with sampling biases by proposing DeepMaxent, a method that combines neural networks with the maximum entropy principle to automatically learn shared features among species. The results show that DeepMaxent outperforms Maxent and other leading species distribution models across all tested regions and taxonomic groups, particularly in areas with uneven sampling.

The rapid expansion of citizen science initiatives has led to a significant growth of biodiversity databases, and particularly presence-only (PO) observations. PO data are invaluable for understanding species distributions and their dynamics, but their use in a Species Distribution Model (SDM) is curtailed by sampling biases and the lack of information on absences. Poisson point processes are widely used for SDMs, with Maxent being one of the most popular methods. Maxent maximises the entropy of a probability distribution across sites as a function of predefined transformations of variables, called features. In contrast, neural networks and deep learning have emerged as a promising technique for automatic feature extraction from complex input variables. Arbitrarily complex transformations of input variables can be learned from the data efficiently through backpropagation and stochastic gradient descent (SGD). In this paper, we propose DeepMaxent, which harnesses neural networks to automatically learn shared features among species, using the maximum entropy principle. To do so, it employs a normalised Poisson loss where for each species, presence probabilities across sites are modelled by a neural network. We evaluate DeepMaxent on a benchmark dataset known for its spatial sampling biases, using PO data for calibration and presence-absence (PA) data for validation across six regions with different biological groups and covariates. Our results indicate that DeepMaxent performs better than Maxent and other leading SDMs across all regions and taxonomic groups. The method performs particularly well in regions of uneven sampling, demonstrating substantial potential to increase SDM performances. In particular, our approach yields more accurate predictions than traditional single-species models, which opens up new possibilities for methodological enhancement.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes