CLSDASDec 19, 2022

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

arXiv:2212.09699v3135 citationsh-index: 35
Originality Incremental advance
AI Analysis

This addresses data scarcity in speech translation, particularly for low-resource scenarios, by enhancing dataset utility, though it is an incremental improvement over existing augmentation methods.

The paper tackles the lack of data in end-to-end speech translation by proposing SegAugment, a data augmentation strategy that generates multiple sentence-level versions from document-level datasets, resulting in an average increase of 2.5 BLEU points across eight language pairs and establishing new state-of-the-art results in MuST-C.

End-to-end Speech Translation is hindered by a lack of available data resources. While most of them are based on documents, a sentence-level version is available, which is however single and static, potentially impeding the usefulness of the data. We propose a new data augmentation strategy, SegAugment, to address this issue by generating multiple alternative sentence-level versions of a dataset. Our method utilizes an Audio Segmentation system, which re-segments the speech of each document with different length constraints, after which we obtain the target text via alignment methods. Experiments demonstrate consistent gains across eight language pairs in MuST-C, with an average increase of 2.5 BLEU points, and up to 5 BLEU for low-resource scenarios in mTEDx. Furthermore, when combined with a strong system, SegAugment establishes new state-of-the-art results in MuST-C. Finally, we show that the proposed method can also successfully augment sentence-level datasets, and that it enables Speech Translation models to close the gap between the manual and automatic segmentation at inference time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes