SDLGASNov 15, 2020

Unsupervised Contrastive Learning of Sound Event Representations

arXiv:2011.07616v171 citations
Originality Incremental advance
AI Analysis

This work addresses data scarcity and label noise issues in sound event research, offering a domain-specific incremental improvement.

The paper tackled the problem of limited labeled data in sound event recognition by proposing an unsupervised contrastive learning method using augmented sound event views, resulting in improved performance in classification tasks with scarce or noisy labels, outperforming supervised baselines.

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound events. The views are computed primarily via mixing of training examples with unrelated backgrounds, followed by other data augmentations. We analyze the main components of our method via ablation experiments. We evaluate the learned representations using linear evaluation, and in two in-domain downstream sound event classification tasks, namely, using limited manually labeled data, and using noisy labeled data. Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels, outperforming supervised baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes