Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers
It addresses wildlife conservation in Africa by providing practical model selection insights, but it is incremental as it applies existing methods to a specific dataset.
This paper tackled the problem of classifying African wildlife images for biodiversity monitoring by comparing deep learning models, finding that Vision Transformer ViT-H/14 achieved 99% accuracy but with high computational cost, while DenseNet-201 reached 67% accuracy and was deployed for real-time use.
Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades. In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation. This paper presents a comparative study of deep learning models for automatically classifying African wildlife images, focusing on transfer learning with frozen feature extractors. Using a public dataset of four species: buffalo, elephant, rhinoceros, and zebra; we evaluate the performance of DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer ViT-H/14. DenseNet-201 achieved the best performance among convolutional networks (67% accuracy), while ViT-H/14 achieved the highest overall accuracy (99%), but with significantly higher computational cost, raising deployment concerns. Our experiments highlight the trade-offs between accuracy, resource requirements, and deployability. The best-performing CNN (DenseNet-201) was integrated into a Hugging Face Gradio Space for real-time field use, demonstrating the feasibility of deploying lightweight models in conservation settings. This work contributes to African-grounded AI research by offering practical insights into model selection, dataset preparation, and responsible deployment of deep learning tools for wildlife conservation.