ASCLLGOct 6, 2021

CTC Variations Through New WFST Topologies

arXiv:2110.03098v324 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in speech recognition systems, offering incremental improvements for practitioners dealing with large-scale models.

The paper tackled the problem of reducing memory consumption and graph size in Connectionist Temporal Classification (CTC) for automatic speech recognition by proposing three new WFST topologies, resulting in up to 4 times memory reduction with minimal accuracy loss.

This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with <epsilon> back-off transitions; (2) the "minimal-CTC", that only adds <blank> self-loops when used in WFST-composition; and (3) the "selfless-CTC" variants, which disallows self-loop for non-blank units. Compact-CTC allows for 1.5 times smaller WFST decoding graphs and reduces memory consumption by two times when training CTC models with the LF-MMI objective without hurting the recognition accuracy. Minimal-CTC reduces graph size and memory consumption by two and four times for the cost of a small accuracy drop. Using selfless-CTC can improve the accuracy for wide context window models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes