CLSDASOct 18, 2021

Efficient Sequence Training of Attention Models using Approximative Recombination

arXiv:2110.09245v23 citations
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck in training attention-based speech recognition models, offering an incremental improvement for researchers and practitioners in automatic speech recognition.

The paper tackles the intractable sum over all word sequences in sequence discriminative training for speech recognition by proposing approximative recombination of hypotheses during beam search, which increases the effective beam size by several orders of magnitude without significantly raising computational costs, as demonstrated on the LibriSpeech task.

Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system. It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice. Current state-of-the-art systems with unlimited label context circumvent this problem by limiting the summation to an n-best list of relevant competing hypotheses obtained from beam search. This work proposes to perform (approximative) recombinations of hypotheses during beam search, if they share a common local history. The error that is incurred by the approximation is analyzed and it is shown that using this technique the effective beam size can be increased by several orders of magnitude without significantly increasing the computational requirements. Lastly, it is shown that this technique can be used to effectively perform sequence discriminative training for attention-based encoder-decoder acoustic models on the LibriSpeech task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes