CLDec 6, 2018

End-to-End Streaming Keyword Spotting

arXiv:1812.02802v273 citations
Originality Incremental advance
AI Analysis

This work addresses efficient keyword detection for applications like voice assistants, representing an incremental advance with specific performance gains.

The paper tackles keyword spotting in audio streams by proposing an end-to-end deep neural network system with a memoized topology and training method, achieving significant improvements in detection quality, model size, and computational efficiency.

We present a system for keyword spotting that, except for a frontend component for feature generation, it is entirely contained in a deep neural network (DNN) model trained "end-to-end" to predict the presence of the keyword in a stream of audio. The main contributions of this work are, first, an efficient memoized neural network topology that aims at making better use of the parameters and associated computations in the DNN by holding a memory of previous activations distributed over the depth of the DNN. The second contribution is a method to train the DNN, end-to-end, to produce the keyword spotting score. This system significantly outperforms previous approaches both in terms of quality of detection as well as size and computation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes