CLAIMay 13

Scaling few-shot spoken word classification with generative meta-continual learning

arXiv:2605.1307556.0
AI Analysis

This work addresses the scalability of few-shot spoken word classification for applications requiring many classes, but the results are incremental as GeMCL does not outperform fully-finetuned baselines.

The paper investigates scaling few-shot spoken word classification to 1000 classes with only five shots per class, using the Generative Meta-Continual Learning (GeMCL) algorithm. GeMCL achieves comparable performance to a frozen HuBERT model with a repeatedly trained classifier head while adapting 2000 times faster and training on less than half the data for two orders of magnitude less time.

Few-shot spoken word classification has largely been developed for applications where a small number of classes is considered, and so the potential of larger-scale few-shot spoken word classification remains untapped. This paper investigates the potential of a spoken word classifier to sequentially learn to distinguish between 1000 classes when it is given only five shots per class. We demonstrate that this scaling capability exists by training a model using the Generative Meta-Continual Learning (GeMCL) algorithm and comparing it to repeatedly trained or finetuned baselines. We find that GeMCL produces exceptionally stable performance, and although it does not always outperform a repeatedly fully-finetuned HuBERT model nor a frozen HuBERT model with a repeatedly trained classifier head, it produces comparable performance to the latter while adapting 2000 times faster, having been trained less than half of the data for two orders of magnitude less time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes