CLSDASFeb 21, 2023

Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

arXiv:2302.10871v1268 citationsh-index: 49
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in speech translation models for researchers and practitioners, offering an incremental improvement over existing CTC regularization methods.

The paper tackles the computational inefficiency of using Connectionist Temporal Classification (CTC) regularization in end-to-end speech translation by proposing CoLaCTC, which reduces the label space via heuristic merging like modulo operations, achieving up to 1.77x training speedup while maintaining or improving performance across multiple languages.

For end-to-end speech translation, regularizing the encoder with the Connectionist Temporal Classification (CTC) objective using the source transcript or target translation as labels can greatly improve quality metrics. However, CTC demands an extra prediction layer over the vocabulary space, bringing in nonnegligible model parameters and computational overheads, although this layer is typically not used for inference. In this paper, we re-examine the need for genuine vocabulary labels for CTC for regularization and explore strategies to reduce the CTC label space, targeting improved efficiency without quality degradation. We propose coarse labeling for CTC (CoLaCTC), which merges vocabulary labels via simple heuristic rules, such as using truncation, division or modulo (MOD) operations. Despite its simplicity, our experiments on 4 source and 8 target languages show that CoLaCTC with MOD particularly can compress the label space aggressively to 256 and even further, gaining training efficiency (1.18x ~ 1.77x speedup depending on the original vocabulary size) yet still delivering comparable or better performance than the CTC baseline. We also show that CoLaCTC successfully generalizes to CTC regularization regardless of using transcript or translation for labeling.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes