SDLGASNov 17, 2022

Balanced Deep CCA for Bird Vocalization Detection

ETH Zurich
arXiv:2211.09376v16 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the problem of sparse and imbalanced labeled data in multi-modal event detection, particularly for bird vocalization monitoring, with potential utility in low-resource scenarios, though it is incremental as it builds on existing DCCA methods.

The authors tackled the challenge of detecting bird vocalizations using limited labeled data by developing a self-supervised learning method that learns correlations between sound and vibration signals, resulting in improved performance over classical deep canonical correlation analysis for downstream detection tasks.

Event detection improves when events are captured by two different modalities rather than just one. But to train detection systems on multiple modalities is challenging, in particular when there is abundance of unlabelled data but limited amounts of labeled data. We develop a novel self-supervised learning technique for multi-modal data that learns (hidden) correlations between simultaneously recorded microphone (sound) signals and accelerometer (body vibration) signals. The key objective of this work is to learn useful embeddings associated with high performance in downstream event detection tasks when labeled data is scarce and the audio events of interest (songbird vocalizations) are sparse. We base our approach on deep canonical correlation analysis (DCCA) that suffers from event sparseness. We overcome the sparseness of positive labels by first learning a data sampling model from the labelled data and by applying DCCA on the output it produces. This method that we term balanced DCCA (b-DCCA) improves the performance of the unsupervised embeddings on the downstream supervised audio detection task compared to classsical DCCA. Because data labels are frequently imbalanced, our method might be of broad utility in low-resource scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes