SDLGASMLJan 8, 2019

Presence-absence estimation in audio recordings of tropical frog communities

arXiv:1901.02495v11 citations
Originality Incremental advance
AI Analysis

This provides a non-invasive tool for ecologists to study frog communities in tropical environments, though it is incremental as it builds on existing audio detection methods.

The researchers tackled the problem of automatically detecting frog species presence in tropical audio recordings by developing a Gaussian mixture model classifier with a modified filter-bank, achieving an average weighted error rate of 0.9% in cross-validation and 96.66% accuracy in real-world tests.

One non-invasive way to study frog communities is by analyzing long-term samples of acoustic material containing calls. This immense task has been optimized by the development of Machine Learning tools to extract ecological information. We explored a likelihood-ratio audio detector based on Gaussian mixture model classification of 10 frog species, and applied it to estimate presence-absence in audio recordings from an actual amphibian monitoring performed at Yasuní National Park in the Ecuadorian Amazonia. A modified filter-bank was used to extract 20 cepstral features that model the spectral content of frog calls. Experiments were carried out to investigate the hyperparameters and the minimum frog-call time needed to train an accurate GMM classifier. With 64 Gaussians and 12 seconds of training time, the classifier achieved an average weighted error rate of 0.9% on the 10-fold cross-validation for nine species classification, as compared to 3% with MFCC and 1.8% with PLP features. For testing, 10 GMMs were trained using all the available training-validation dataset to study 23.5 hours in 141, 10-minute long samples of unidentified real-world audio recorded at two frog communities in 2001 with analog equipment. To evaluate automatic presence-absence estimation, we characterized the audio samples with 10 binary variables each corresponding to a frog species, and manually labeled a sub-set of 18 samples using headphones. A recall of 87.5% and precision of 100% with average accuracy of 96.66% suggests good generalization ability of the algorithm, and provides evidence of the validity of this approach to study real-world audio recorded in a tropical acoustic environment. Finally, we applied the algorithm to the available corpus, and show its potentiality to gain insights into the temporal reproductive behavior of frogs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes