CLLGDec 16, 2019

Predicting detection filters for small footprint open-vocabulary keyword spotting

arXiv:1912.07575v222 citations
Originality Incremental advance
AI Analysis

This enables small-footprint, customizable keyword detection for devices, but it is incremental as it builds on existing neural methods with specific optimizations.

The paper tackles the problem of open-vocabulary keyword spotting for customizable voice interfaces without task-specific data, proposing a neural network under 250KB that predicts detection filters and outperforms baselines by a large margin on multiple tasks.

In this paper, we propose a fully-neural approach to open-vocabulary keyword spotting, that allows the users to include a customizable voice interface to their device and that does not require task-specific data. We present a keyword detection neural network weighing less than 250KB, in which the topmost layer performing keyword detection is predicted by an auxiliary network, that may be run offline to generate a detector for any keyword. We show that the proposed model outperforms acoustic keyword spotting baselines by a large margin on two tasks of detecting keywords in utterances and three tasks of detecting isolated speech commands. We also propose a method to fine-tune the model when specific training data is available for some keywords, which yields a performance similar to a standard speech command neural network while keeping the ability of the model to be applied to new keywords.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes