Douglas Gillespie

h-index28

3papers

3,052citations

3 Papers

3.9CVApr 5, 2023Code

Learning Stage-wise GANs for Whistle Extraction in Time-Frequency Spectrograms

Pu Li, Marie Roch, Holger Klinck et al.

Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process large volumes of recording data. Recently, a deep learning-based method demonstrated superior performance in extracting whistles under varying noise conditions. However, training such networks requires a large amount of labor-intensive annotation, which is not available for many species. To overcome this limitation, we present a framework of stage-wise generative adversarial networks (GANs), which compile new whistle data suitable for deep model training via three stages: generation of background noise in the spectrogram, generation of whistle contours, and generation of whistle signals. By separating the generation of different components in the samples, our framework composes visually promising whistle data and labels even when few expert annotated data are available. Regardless of the amount of human-annotated data, the proposed data augmentation framework leads to a consistent improvement in performance of the whistle extraction model, with a maximum increase of 1.69 in the whistle extraction mean F1-score. Our stage-wise GAN also surpasses one single GAN in improving whistle extraction models with augmented data. The data and code will be available at https://github.com/Paul-LiPu/CompositeGAN\_WhistleAugment.

6.3SDJul 5

Classifying bioacoustic data without individual call annotations using temporal convolutional networks and feature extractors

Laia Garrobé Fonollosa, Douglas Gillespie, Lina Stankovic et al.

Bioacoustic data from Passive Acoustic Monitoring (PAM) generates large datasets where obtaining detailed auditing and labelling is often impractical, resulting in weak annotations (e.g., presence/absence of species over several minutes of recording). In order to effectively capture the complex temporal patterns and key features of long audio segments, we propose a framework comprising dataset standardisation, feature extraction, and classification via Temporal Convolutional Networks (TCN). This approach eliminates the necessity for setting heuristic decision rules or creating time-consuming strong labels. To demonstrate the effectiveness of our approach, we use sperm whale (\textit{Physeter macrocephalus}) click trains in 4-minute recordings as a case study, from a dataset comprising diverse sources and deployment conditions to maximise generalisability. Our TCN classifiers achieve recall rates exceeding 0.83 at a 0.13 false positive rate, comparable to agreement rates between expert annotators. We compare two methods of feature extraction, Variational AutoEncoders (VAEs) and traditional handpicking of features, and found them to yield similar performance results, with the VAE-based classifiers seeing a more stable performance across datasets and recording conditions. These results offer a way forward in leveraging numerous existing annotated bioacoustic datasets to train automatic classification models, effectively overcoming previous limitations associated with weak labels.

4.3QMMay 18, 2020

Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours

Pu Li, Xiaobai Liua, K. J. Palmer et al.

We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-frequency peaks that are associated with whistles. We also developed a comprehensive method to synthesize training samples from background environments and train the network with minimal human annotation effort. We applied the proposed learn-from-synthesis method to a subset of the public Detection, Classification, Localization, and Density Estimation (DCLDE) 2011 workshop data to extract whistle confidence maps, which we then processed with an existing contour extractor to produce whistle annotations. The F1-score of our best synthesis method was 0.158 greater than our baseline whistle extraction algorithm (~25% improvement) when applied to common dolphin (Delphinus spp.) and bottlenose dolphin (Tursiops truncatus) whistles.