CLASSep 12, 2016

The Microsoft 2016 Conversational Speech Recognition System

arXiv:1609.03528v2292 citations
AI Analysis

This work advances state-of-the-art speech recognition for conversational systems, though it is incremental as it builds on existing neural network techniques.

The paper tackled conversational speech recognition by combining neural-network-based acoustic and language modeling, achieving a word error rate of 6.2% on the NIST 2000 Switchboard task, which improved over previous results.

We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes