BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations
Provides an efficient bioacoustic enhancement model for biodiversity monitoring and conservation, addressing the gap in audio enhancement for animal vocalizations.
BioSEN adapts speech enhancement methods to bioacoustics, achieving state-of-the-art performance on three datasets with significantly lower computational cost.
Most work in audio enhancement targets human speech, while bioacoustics is less studied due to noisy recordings and the distinct traits of animal sounds. To fill this gap, we adapt speech enhancement methods and build BioSEN, a model made for bioacoustic signals. BioSEN has three modules: a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an energy-adaptive gating connection unit that uses frequency weights to keep vocalizations from being removed as noise. Tests on three bioacoustic datasets show that BioSEN matches or exceeds state-of-the-art speech enhancement models while using far less computation. These results show BioSEN's strength for bioacoustic audio enhancement and its promise for biodiversity monitoring and conservation.