LGSDASMLNov 26, 2018

DONUT: CTC-based Query-by-Example Keyword Spotting

arXiv:1811.10736v135 citations
Originality Incremental advance
AI Analysis

This addresses the need for user-adaptable wakeword systems in ubiquitous voice-controlled devices, offering an incremental improvement by combining existing CTC-based methods with query-by-example convenience.

The paper tackles the problem of personalized custom wakeword detection for voice-controlled devices by introducing DONUT, a CTC-based query-by-example algorithm that enables users to train wakewords with a few examples, achieving low computational requirements suitable for embedded systems.

Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on embedded systems without requiring private user data to be uploaded to the cloud.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes