CLSDASJun 10, 2019

Word-level Speech Recognition with a Letter to Word Encoder

arXiv:1906.04323v29 citations
Originality Incremental advance
AI Analysis

This addresses efficiency and generalization in speech recognition systems, though it appears incremental as it builds on existing sequence models.

The paper tackles speech recognition by proposing a direct-to-word sequence model that learns word embeddings from letters, achieving word error rate gains over sub-word models and enabling prediction of unseen words without retraining.

We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes