CLLGNEMar 1, 2017

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

arXiv:1703.00096v256 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation in sequence labeling for speech recognition, offering a novel approach that could enhance accuracy and computational efficiency, though it appears incremental as an extension of CTC.

The paper tackles the problem of fixed basic units and decompositions in sequence labeling models, which leads to suboptimal performance, by proposing Gram-CTC, a loss function that automatically learns optimal units and decompositions, resulting in improved performance and efficiency on large vocabulary speech recognition tasks, including outperforming state-of-the-art on a standard benchmark.

Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this pa- per, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. We demonstrate that the proposed Gram-CTC improves CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that with Gram-CTC we can outperform the state-of-the-art on a standard speech benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes