SD LG ASMar 1, 2025

Synthetic data enables context-aware bioacoustic sound event detection

Benjamin Hoffman, David Robinson, Marius Miron, Vittorio Baglione, Daniela Canestrari, Damian Elias, Eva Trapote, Felix Effenberger, Maddie Cusimano, Masato Hagiwara, Olivier Pietquin

arXiv:2503.00296v23 citationsh-index: 23

Originality Incremental advance

AI Analysis

This provides ecologists and ethologists with a training-free tool for bioacoustic analysis, though it is incremental as it builds on existing synthetic data and transformer methods.

The authors tackled the problem of few-shot bioacoustic sound event detection by training a transformer-based model on over 8.8 thousand hours of synthetically generated audio with strong labels, which outperformed previous methods by 64% relative improvement and introduced a public benchmark of 13 tasks.

We propose a methodology for training foundation models that enhances their in-context learning capabilities within the domain of bioacoustic signal processing. We use synthetically generated training data, introducing a domain-randomization-based pipeline that constructs diverse acoustic scenes with temporally strong labels. We generate over 8.8 thousand hours of strongly-labeled audio and train a query-by-example, transformer-based model to perform few-shot bioacoustic sound event detection. Our second contribution is a public benchmark of 13 diverse few-shot bioacoustics tasks. Our model outperforms previously published methods, and improves relative to other training-free methods by $64\%$. We demonstrate that this is due to increase in model size and data scale, as well as algorithmic improvements. We make our trained model available via an API, to provide ecologists and ethologists with a training-free tool for bioacoustic sound event detection.

View on arXiv PDF

Similar