Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
This work addresses the problem of limited labeled data in audio classification for researchers, but it is incremental as it builds on existing few-shot learning methods.
The paper tackled few-shot audio classification by integrating supervised contrastive loss into prototypical training, achieving state-of-the-art performance in a 5-way, 5-shot setting on the MetaAudio benchmark.
Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.