ASCLSDMay 17, 2020

Wake Word Detection with Alignment-Free Lattice-Free MMI

arXiv:2005.08347v319 citations
AI Analysis

This improves wake word detection for personal digital assistants, though it is incremental as it builds on existing LF-MMI training.

The paper tackles wake word detection for always-on spoken language interfaces by training hybrid DNN/HMM systems from partially labeled data, achieving 50%–90% reduction in false rejection rates at specified false alarm rates compared to prior methods.

Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data, and to use it in on-line applications: (i) we remove the prerequisite of frame-level alignments in the LF-MMI training algorithm, permitting the use of un-transcribed training examples that are annotated only for the presence/absence of the wake word; (ii) we show that the classical keyword/filler model must be supplemented with an explicit non-speech (silence) model for good performance; (iii) we present an FST-based decoder to perform online detection. We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures, and re-validate them on a third (large) data set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes