CLAILGMar 15, 2017

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

arXiv:1703.05390v3195 citations
Originality Incremental advance
AI Analysis

This work addresses keyword spotting for human-technology interfaces, presenting an incremental improvement by combining convolutional and recurrent layers to optimize performance and efficiency.

The paper tackled the problem of keyword spotting by developing a Convolutional Recurrent Neural Network (CRNN) model to maximize detection accuracy with low false alarm rates while minimizing footprint size and latency, achieving 97.71% accuracy at 0.5 false alarms per hour for 5 dB signal-to-noise ratio with only about 230k parameters.

Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks (CRNNs). Inspired by large-scale state-of-the-art speech recognition systems, we combine the strengths of convolutional layers and recurrent layers to exploit local structure and long-range context. We analyze the effect of architecture parameters, and propose training strategies to improve performance. With only ~230k parameters, our CRNN model yields acceptably low latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise ratio.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes