CLMar 15, 2018

Advancing Connectionist Temporal Classification With Attention Modeling

arXiv:1803.05563v151 citations
Originality Incremental advance
AI Analysis

This work improves speech recognition accuracy for voice assistant applications, representing an incremental advancement in neural network-based methods.

The authors tackled speech recognition by incorporating attention modeling into the Connectionist Temporal Classification framework, achieving a 20% relative reduction in word error rates on a 3400-hour voice assistant task.

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components. We evaluate our system on a 3400 hours Microsoft Cortana voice assistant task and demonstrate that our proposed model consistently outperforms the baseline model achieving about 20% relative reduction in word error rates.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes