ASLGSDJul 25, 2020

Few-Shot Keyword Spotting With Prototypical Networks

arXiv:2007.14463v141 citations
AI Analysis

This addresses the limitation of existing deep learning systems that require large datasets for pre-defined keywords, enabling more flexible voice interfaces.

The paper tackles the problem of recognizing new, user-defined keywords in voice interfaces with limited data by formulating it as few-shot keyword spotting and using metric learning, achieving keyword spotting with just a small number of samples.

Recognizing a particular command or a keyword, keyword spotting has been widely used in many voice interfaces such as Amazon's Alexa and Google Home. In order to recognize a set of keywords, most of the recent deep learning based approaches use a neural network trained with a large number of samples to identify certain pre-defined keywords. This restricts the system from recognizing new, user-defined keywords. Therefore, we first formulate this problem as a few-shot keyword spotting and approach it using metric learning. To enable this research, we also synthesize and publish a Few-shot Google Speech Commands dataset. We then propose a solution to the few-shot keyword spotting problem using temporal and dilated convolutions on prototypical networks. Our comparative experimental results demonstrate keyword spotting of new keywords using just a small number of samples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes