CLApr 27, 2016

The IBM 2016 English Conversational Telephone Speech Recognition System

arXiv:1604.08242v2108 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speech recognition for telephone conversations, representing an incremental improvement in a specific domain.

The paper tackled the problem of English conversational telephone speech recognition by developing a system that achieved a record 6.6% word error rate on the Switchboard subset of the Hub5 2000 evaluation testset.

We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bidirectional long short-term memory nets which operate on FMLLR and i-vector features. On the language modeling side, we use an updated model "M" and hierarchical neural network LMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes