CLLGASNov 20, 2019

On Using SpecAugment for End-to-End Speech Translation

arXiv:1911.08876v1675 citations
Originality Synthesis-oriented
AI Analysis

This work addresses overfitting in speech translation for researchers and practitioners, but it is incremental as it adapts an existing method to a new domain.

The paper applied SpecAugment, a simple data augmentation technique, to end-to-end speech translation and achieved improvements of up to +2.2% BLEU on LibriSpeech En->Fr and +1.2% on IWSLT En->De tasks by reducing overfitting.

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2\% \BLEU on LibriSpeech Audiobooks En->Fr and +1.2% on IWSLT TED-talks En->De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes