CLSDASJun 3, 2020

Self-Training for End-to-End Speech Translation

arXiv:2006.02490v269 citations
AI Analysis

This addresses data scarcity for speech translation researchers, but it is incremental as it builds on existing self-training methods.

The paper tackles data scarcity in end-to-end speech translation by using pseudo-labels from unlabeled audio, achieving 8.3 and 5.7 BLEU gains over a baseline and reaching state-of-the-art performance on MuST-C datasets.

One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes