ASCLSDJun 22, 2020

Self-Supervised Representations Improve End-to-End Speech Translation

arXiv:2006.12124v243 citations
AI Analysis

This addresses data scarcity for speech translation systems, but it is incremental as it builds on existing pre-training methods.

The paper tackled the challenge of data scarcity in end-to-end speech-to-text translation by exploring self-supervised pre-trained speech representations, showing they consistently improve translation performance and enable cross-lingual transfer to various languages with minimal tuning.

End-to-end speech-to-text translation can provide a simpler and smaller system but is facing the challenge of data scarcity. Pre-training methods can leverage unlabeled data and have been shown to be effective on data-scarce settings. In this work, we explore whether self-supervised pre-trained speech representations can benefit the speech translation task in both high- and low-resource settings, whether they can transfer well to other languages, and whether they can be effectively combined with other common methods that help improve low-resource end-to-end speech translation such as using a pre-trained high-resource speech recognition system. We demonstrate that self-supervised pre-trained features can consistently improve the translation performance, and cross-lingual transfer allows to extend to a variety of languages without or with little tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes