ASAISPDec 24, 2024

Text-Aware Adapter for Few-Shot Keyword Spotting

arXiv:2412.18142v13 citationsh-index: 5ICASSP
Originality Incremental advance
AI Analysis

This work addresses the challenge of personalizing keyword spotting for users without requiring speech samples during enrollment, representing an incremental improvement in few-shot learning efficiency.

The paper tackles the problem of improving target keyword performance in flexible keyword spotting with text enrollment by proposing a text-aware adapter for few-shot transfer learning, resulting in significant performance improvements across 35 keywords with only a 0.14% parameter increase.

Recent advances in flexible keyword spotting (KWS) with text enrollment allow users to personalize keywords without uttering them during enrollment. However, there is still room for improvement in target keyword performance. In this work, we propose a novel few-shot transfer learning method, called text-aware adapter (TA-adapter), designed to enhance a pre-trained flexible KWS model for specific keywords with limited speech samples. To adapt the acoustic encoder, we leverage a jointly pre-trained text encoder to generate a text embedding that acts as a representative vector for the keyword. By fine-tuning only a small portion of the network while keeping the core components' weights intact, the TA-adapter proves highly efficient for few-shot KWS, enabling a seamless return to the original pre-trained model. In our experiments, the TA-adapter demonstrated significant performance improvements across 35 distinct keywords from the Google Speech Commands V2 dataset, with only a 0.14% increase in the total number of parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes