SD LG ASFeb 2, 2024

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification

Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi

arXiv:2402.01274v34.94 citationsh-index: 772024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

Originality Synthesis-oriented

AI Analysis

It addresses the problem of few-shot learning in acoustics for researchers, but is incremental as it applies existing methods to a new domain.

This study tackled the gap in evaluating self-supervised learning for few-shot audio classification, finding state-of-the-art performance on datasets like SpeechCommandsv2 and strong correlations between few-shot tasks and other audio benchmarks.

In recent years, self-supervised learning has excelled for its capacity to learn robust feature representations from unlabelled data. Networks pretrained through self-supervision serve as effective feature extractors for downstream tasks, including Few-Shot Learning. While the evaluation of unsupervised approaches for few-shot learning is well-established in imagery, it is notably absent in acoustics. This study addresses this gap by assessing large-scale self-supervised models' performance in few-shot audio classification. Additionally, we explore the relationship between a model's few-shot learning capability and other downstream task benchmarks. Our findings reveal state-of-the-art performance in some few-shot problems such as SpeechCommandsv2, as well as strong correlations between speech-based few-shot problems and various downstream audio tasks.

View on arXiv PDF

Similar