LGCLNEMar 30, 2017

Simplified End-to-End MMI Training and Voting for ASR

arXiv:1703.10356v23 citations
Originality Incremental advance
AI Analysis

This work addresses speech recognition for applications requiring efficient and accurate systems, but it is incremental as it builds on existing MMI and CTC methods.

The paper tackles the problem of speech recognition by proposing a simplified end-to-end training method using the maximum mutual information (MMI) criterion, which outperforms connectionist temporal classification (CTC) in performance, robustness, decoding time, disk footprint, and alignment quality, and enables an ensemble method that reduces word error rate.

A simplified speech recognition system that uses the maximum mutual information (MMI) criterion is considered. End-to-end training using gradient descent is suggested, similarly to the training of connectionist temporal classification (CTC). We use an MMI criterion with a simple language model in the training stage, and a standard HMM decoder. Our method compares favorably to CTC in terms of performance, robustness, decoding time, disk footprint and quality of alignments. The good alignments enable the use of a straightforward ensemble method, obtained by simply averaging the predictions of several neural network models, that were trained separately end-to-end. The ensemble method yields a considerable reduction in the word error rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes