SDCLASMar 29, 2018

Attention-based End-to-End Models for Small-Footprint Keyword Spotting

arXiv:1803.10916v1117 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient keyword spotting systems in resource-constrained devices, representing an incremental improvement over existing methods.

The paper tackles the problem of small-footprint keyword spotting by proposing an attention-based end-to-end neural model, which achieves a false rejection rate of 1.02% at 1.0 false alarm per hour with about 84K parameters, outperforming recent Deep KWS approaches.

In this paper, we propose an attention-based end-to-end neural approach for small-footprint keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality KWS system. Our model consists of an encoder and an attention mechanism. The encoder transforms the input signal into a high level representation using RNNs. Then the attention mechanism weights the encoder features and generates a fixed-length vector. Finally, by linear transformation and softmax function, the vector becomes a score used for keyword detection. We also evaluate the performance of different encoder architectures, including LSTM, GRU and CRNN. Experiments on real-world wake-up data show that our approach outperforms the recent Deep KWS approach by a large margin and the best performance is achieved by CRNN. To be more specific, with ~84K parameters, our attention-based model achieves 1.02% false rejection rate (FRR) at 1.0 false alarm (FA) per hour.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes