ASSDJul 11, 2018

Efficient keyword spotting using time delay neural networks

arXiv:1807.04353v240 citations
Originality Incremental advance
AI Analysis

This work addresses efficient keyword spotting for speech recognition applications, offering incremental improvements in accuracy and computational savings.

The paper tackles live keyword spotting by introducing a two-stage time delay neural network trained with transfer learning, achieving significant improvements in false accept and false reject rates on both public and in-house datasets, and reducing computational complexity by up to 89% compared to prior methods.

This paper describes a novel method of live keyword spotting using a two-stage time delay neural network. The model is trained using transfer learning: initial training with phone targets from a large speech corpus is followed by training with keyword targets from a smaller data set. The accuracy of the system is evaluated on two separate tasks. The first is the freely available Google Speech Commands dataset. The second is an in-house task specifically developed for keyword spotting. The results show significant improvements in false accept and false reject rates in both clean and noisy environments when compared with previously known techniques. Furthermore, we investigate various techniques to reduce computation in terms of multiplications per second of audio. Compared to recently published work, the proposed system provides up to 89% savings on computational complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes