CLNEJul 19, 2016

Trainable Frontend For Robust and Far-Field Keyword Spotting

arXiv:1607.05666v1153 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of hands-free communication in far-field conditions for speech recognition systems, representing an incremental improvement with specific gains.

The paper tackled the problem of robust and far-field keyword spotting by introducing a novel frontend called per-channel energy normalization (PCEN), which significantly improved recognition performance on noisy and far-field evaluation sets.

Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes