CV AIJan 2

WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift

Julian D. Santamaria, Claudia Isaza, Jhony H. Giraldo

arXiv:2601.00993v11.5h-index: 14Has Code

Originality Incremental advance

AI Analysis

This addresses a critical limitation in automated wildlife monitoring for ecologists and conservationists, though it is an incremental improvement over existing foundation models.

The paper tackles the problem of deep learning models failing to generalize wildlife identification across geographical domains, such as from Africa to America, where accuracy drops from 84.77% to 16.17%. It introduces WildIng, which integrates text descriptions with image features to improve robustness, enhancing accuracy by 30% under domain shift conditions.

Wildlife monitoring is crucial for studying biodiversity loss and climate change. Camera trap images provide a non-intrusive method for analyzing animal populations and identifying ecological patterns over time. However, manual analysis is time-consuming and resource-intensive. Deep learning, particularly foundation models, has been applied to automate wildlife identification, achieving strong performance when tested on data from the same geographical locations as their training sets. Yet, despite their promise, these models struggle to generalize to new geographical areas, leading to significant performance drops. For example, training an advanced vision-language model, such as CLIP with an adapter, on an African dataset achieves an accuracy of 84.77%. However, this performance drops significantly to 16.17% when the model is tested on an American dataset. This limitation partly arises because existing models rely predominantly on image-based representations, making them sensitive to geographical data distribution shifts, such as variation in background, lighting, and environmental conditions. To address this, we introduce WildIng, a Wildlife image Invariant representation model for geographical domain shift. WildIng integrates text descriptions with image features, creating a more robust representation to geographical domain shifts. By leveraging textual descriptions, our approach captures consistent semantic information, such as detailed descriptions of the appearance of the species, improving generalization across different geographical locations. Experiments show that WildIng enhances the accuracy of foundation models such as BioCLIP by 30% under geographical domain shift conditions. We evaluate WildIng on two datasets collected from different regions, namely America and Africa. The code and models are publicly available at https://github.com/Julian075/CATALOG/tree/WildIng.

View on arXiv PDF Code

Similar