LG SD AS MLNov 26, 2018

DONUT: CTC-based Query-by-Example Keyword Spotting

Loren Lugosch, Samuel Myer, Vikrant Singh Tomar

arXiv:1811.10736v19.435 citations

Originality Incremental advance

AI Analysis

This addresses the need for user-adaptable wakeword systems in ubiquitous voice-controlled devices, offering an incremental improvement by combining existing CTC-based methods with query-by-example convenience.

The paper tackles the problem of personalized custom wakeword detection for voice-controlled devices by introducing DONUT, a CTC-based query-by-example algorithm that enables users to train wakewords with a few examples, achieving low computational requirements suitable for embedded systems.

Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on embedded systems without requiring private user data to be uploaded to the cloud.

View on arXiv PDF

Similar