Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers
This work addresses the need for more efficient and scalable bird sound monitoring for conservation and environmental decision-making, though it appears incremental as it builds on existing self-supervised and active learning methods.
The paper tackles the problem of bird sound monitoring by proposing an end-to-end approach using transformers to process raw audio directly, aiming to reduce reliance on labeled datasets and improve environmental assessment.
We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ActiveBird2Vec is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.