LGAICLASMLNov 8, 2018

A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models

arXiv:1811.03700v26 citations
Originality Synthesis-oriented
AI Analysis

This work provides incremental improvements to training methods for speech recognition systems.

The paper compared three lattice-free discriminative training criteria (MMI, bMMI, sMBR) for neural network acoustic models in LVCSR tasks, showing that LF-bMMI outperforms LF-MMI with a 5% relative WER reduction on Switchboard datasets.

In this work, three lattice-free (LF) discriminative training criteria for purely sequence-trained neural network acoustic models are compared on LVCSR tasks, namely maximum mutual information (MMI), boosted maximum mutual information (bMMI) and state-level minimum Bayes risk (sMBR). We demonstrate that, analogous to LF-MMI, a neural network acoustic model can also be trained from scratch using LF-bMMI or LF-sMBR criteria respectively without the need of cross-entropy pre-training. Furthermore, experimental results on Switchboard-300hrs and Switchboard+Fisher-2100hrs datasets show that models trained with LF-bMMI consistently outperform those trained with plain LF-MMI and achieve a relative word error rate (WER) reduction of 5% over competitive temporal convolution projected LSTM (TDNN-LSTMP) LF-MMI baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes