ASLGSDAug 31, 2023

Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

arXiv:2309.00647v12 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the problem of data scarcity for few-shot keyword spotting models, particularly for small-footprint applications, though it is incremental in leveraging existing multi-task learning techniques.

The paper tackles the challenge of few-shot keyword spotting with limited annotated data by using automatically annotated reading speech as auxiliary supervision, achieving notable performance improvements over competitive methods on the benchmark.

Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes