Holger Klinck

LG
h-index16
10papers
722citations
Novelty28%
AI Score25

10 Papers

CVApr 5, 2023Code
Learning Stage-wise GANs for Whistle Extraction in Time-Frequency Spectrograms

Pu Li, Marie Roch, Holger Klinck et al.

Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process large volumes of recording data. Recently, a deep learning-based method demonstrated superior performance in extracting whistles under varying noise conditions. However, training such networks requires a large amount of labor-intensive annotation, which is not available for many species. To overcome this limitation, we present a framework of stage-wise generative adversarial networks (GANs), which compile new whistle data suitable for deep model training via three stages: generation of background noise in the spectrogram, generation of whistle contours, and generation of whistle signals. By separating the generation of different components in the samples, our framework composes visually promising whistle data and labels even when few expert annotated data are available. Regardless of the amount of human-annotated data, the proposed data augmentation framework leads to a consistent improvement in performance of the whistle extraction model, with a maximum increase of 1.69 in the whistle extraction mean F1-score. Our stage-wise GAN also surpasses one single GAN in improving whistle extraction models with augmented data. The data and code will be available at https://github.com/Paul-LiPu/CompositeGAN\_WhistleAugment.

ASJun 6, 2019Code
GIBBONFINDR: An R package for the detection and classification of acoustic signals

Dena J. Clink, Holger Klinck

The recent improvements in recording technology, data storage and battery life have led to an increased interest in the use of passive acoustic monitoring for a variety of research questions. One of the main obstacles in implementing wide scale acoustic monitoring programs in terrestrial environments is the lack of user-friendly, open source programs for processing large sound archives. Here we describe the new, open-source R package GIBBONFINDR which has functions for detection, classification and visualization of acoustic signals using a variety of readily available machine learning algorithms in the R programming environment. We provide a case study showing how GIBBONFINDR functions can be used in a workflow to detect and classify Bornean gibbon (Hylobates muelleri) calls in long-term acoustic data sets recorded in Danum Valley Conservation Area, Sabah, Malaysia. Machine learning is currently one of the most rapidly growing fields-- with applications across many disciplines-- and our goal is to make commonly used signal processing techniques and machine learning algorithms readily available for ecologists who are interested in incorporating bioacoustics techniques into their research.

LGDec 12, 2023
BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics

Jenny Hamer, Eleni Triantafillou, Bart van Merriënboer et al.

The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models.

LGOct 25, 2021
Seeing biodiversity: perspectives in machine learning for wildlife conservation

Devis Tuia, Benjamin Kellenberger, Sara Beery et al.

Data acquisition in animal ecology is rapidly accelerating due to inexpensive and accessible sensors such as smartphones, drones, satellites, audio recorders and bio-logging devices. These new technologies and the data they generate hold great potential for large-scale environmental monitoring and understanding, but are limited by current data processing approaches which are inefficient in how they ingest, digest, and distill data into relevant information. We argue that machine learning, and especially deep learning approaches, can meet this analytic challenge to enhance our understanding, monitoring capacity, and conservation of wildlife species. Incorporating machine learning into ecological workflows could improve inputs for population and behavior models and eventually lead to integrated hybrid modeling tools, with ecological models acting as constraints for machine learning models and the latter providing data-supported insights. In essence, by combining new machine learning approaches with ecological domain knowledge, animal ecologists can capitalize on the abundance of data generated by modern sensor technologies in order to reliably estimate population abundances, study animal behavior and mitigate human/wildlife conflicts. To succeed, this approach will require close collaboration and cross-disciplinary education between the computer science and animal ecology communities in order to ensure the quality of machine learning approaches and train a new generation of data scientists in ecology and conservation.

LGAug 20, 2021
Parsing Birdsong with Deep Audio Embeddings

Irina Tolkova, Brian Chu, Marcel Hedman et al.

Monitoring of bird populations has played a vital role in conservation efforts and in understanding biodiversity loss. The automation of this process has been facilitated by both sensing technologies, such as passive acoustic monitoring, and accompanying analytical tools, such as deep learning. However, machine learning models frequently have difficulty generalizing to examples not encountered in the training data. In our work, we present a semi-supervised approach to identify characteristic calls and environmental noise. We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks, and group the resulting embeddings for a domain expert to identify cluster labels. We show that our approach can improve classification precision and provide insight into the latent structure of environmental acoustic datasets.

QMMay 18, 2020
Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours

Pu Li, Xiaobai Liua, K. J. Palmer et al.

We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-frequency peaks that are associated with whistles. We also developed a comprehensive method to synthesize training samples from background environments and train the network with minimal human annotation effort. We applied the proposed learn-from-synthesis method to a subset of the public Detection, Classification, Localization, and Density Estimation (DCLDE) 2011 workshop data to extract whistle confidence maps, which we then processed with an existing contour extractor to produce whistle annotations. The F1-score of our best synthesis method was 0.158 greater than our baseline whistle extraction algorithm (~25% improvement) when applied to common dolphin (Delphinus spp.) and bottlenose dolphin (Tursiops truncatus) whistles.

SDNov 1, 2019
Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

Vincent Lostanlen, Kaitlin Palmer, Elly Knight et al.

This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with pointwise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics.

CVApr 19, 2018
Recognizing Birds from Sound - The 2018 BirdCLEF Baseline System

Stefan Kahl, Thomas Wilhelm-Stein, Holger Klinck et al.

Reliable identification of bird species in recorded audio files would be a transformative tool for researchers, conservation biologists, and birders. In recent years, artificial neural networks have greatly improved the detection quality of machine learning systems for bird species recognition. We present a baseline system using convolutional neural networks. We publish our code base as reference for participants in the 2018 LifeCLEF bird identification task and discuss our experiments and potential improvements.

SDOct 12, 2016
RAVEN X High Performance Data Mining Toolbox for Bioacoustic Data Analysis

Peter J. Dugan, Holger Klinck, Marie A. Roch et al.

Objective of this work is to integrate high performance computing (HPC) technologies and bioacoustics data-mining capabilities by offering a MATLAB-based toolbox called Raven-X. Raven-X will provide a hardware-independent solution, for processing large acoustic datasets - the toolkit will be available to the community at no cost. This goal will be achieved by leveraging prior work done which successfully deployed MATLAB based HPC tools within Cornell University's Bioacoustics Research Program (BRP). These tools enabled commonly available multi-core computers to process data at accelerated rates to detect and classify whale sounds in large multi-channel sound archives. Through this collaboration, we will expand on this effort which was featured through Mathworks research and industry forums incorporate new cutting-edge detectors and classifiers, and disseminate Raven-X to the broader bioacoustics community.

SDMay 5, 2016
Early and Late Time Acoustic Measures for Underwater Seismic Airgun Signals In Long-Term Acoustic Data Sets

Peter Dugan, Melania Guerra, Dimitri Ponirakis et al.

This work presents a new toolkit for describing the acoustic properties of the ocean environment before, during and after a sound event caused by an underwater seismic air-gun. The toolkit uses existing sound measures, but uniquely applies these to capture the early time period (actual pulse) and late time period (reverberation and multiple arrivals). In total, 183 features are produced for each air-gun sound. This toolkit was utilized on data retrieved from a field deployment encompassing five marine autonomous recording units during a 46-day seismic air-gun survey in Baffin Bay, Greenland. Using this toolkit, a total of 147 million data points were identified from the Greenland deployment recordings. The feasibility of extracting a large number of features was then evaluated using two separate methods: a serial computer and a high performance system. Results indicate that data extraction performance took an estimated 216 hours for the serial system, and 18 hours for the high performance computer. This paper provides an analytical description of the new toolkit along with details for using it to identify relevant data.