SD AI LG ASNov 26, 2025

Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Bruno Padovese, Fabio Frazao, Michael Dowd, Ruth Joy

arXiv:2511.21872v1

Originality Incremental advance

AI Analysis

This work addresses data scarcity in automated detection of Southern Resident Killer Whale vocalizations for conservation efforts, representing an incremental improvement over existing augmentation techniques.

The study tackled the problem of limited annotated datasets for marine mammal vocalization detection by evaluating deep generative models for data augmentation, finding that a hybrid strategy combining diffusion-based synthesis with traditional methods achieved the best performance with an F1-score of 0.81.

Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but relatively simple transformations, leaving open the question of whether deep generative models can provide additional benefits. In this study, we evaluate the potential of deep generative for data augmentation in marine mammal call detection including: Variational Autoencoders, Generative Adversarial Networks, and Denoising Diffusion Probabilistic Models. Using Southern Resident Killer Whale (Orcinus orca) vocalizations from two long-term hydrophone deployments in the Salish Sea, we compare these approaches against traditional augmentation methods such as time-shifting and vocalization masking. While all generative approaches improved classification performance relative to the baseline, diffusion-based augmentation yielded the highest recall (0.87) and overall F1-score (0.75). A hybrid strategy combining generative-based synthesis with traditional methods achieved the best overall performance with an F1-score of 0.81. We hope this study encourages further exploration of deep generative models as complementary augmentation strategies to advance acoustic monitoring of threatened marine mammal populations.

View on arXiv PDF

Similar