LGApr 16, 2024

AudioProtoPNet: An interpretable deep learning model for bird sound classification

arXiv:2404.10420v334 citationsh-index: 6Ecological Informatics
Originality Incremental advance
AI Analysis

This addresses the need for interpretable models in acoustic bird monitoring for ornithologists and machine learning engineers, though it is incremental as it adapts an existing interpretable method to a new domain.

The study tackled the problem of black-box deep learning models in bird sound classification by introducing AudioProtoPNet, an interpretable model that outperformed the state-of-the-art Perch with an average AUROC of 0.90 and cmAP of 0.42, showing relative improvements of 7.1% and 16.7%.

Deep learning models have significantly advanced acoustic bird monitoring by being able to recognize numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. This study introduces AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is an inherently interpretable model that uses a ConvNeXt backbone to extract embeddings, with the classification layer replaced by a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species' vocalizations from spectrograms of training instances. During inference, audio recordings are classified by comparing them to the learned prototypes in the embedding space, providing explanations for the model's decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9,734 bird species and over 6,800 hours of recordings. Its performance was evaluated on the seven test datasets of BirdSet, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art model Perch, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet inherently interpretable deep learning models that provide valuable insights for ornithologists and machine learning engineers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes