SDCLLGASApr 3, 2023

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

arXiv:2304.01905v26 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work addresses efficiency and accuracy challenges in wake word spotting for speech recognition systems, representing an incremental improvement over existing methods.

The paper tackles the problem of inefficient wake word recognition in speech recognition by proposing a dual-attention neural biasing architecture that dynamically switches compute paths based on wake word detection. The result is a 90% reduction in compute cost for wake word audio frames with a 16% relative improvement in wake word F1 score and a 3% relative improvement in rare word error rate.

We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes