CLSDASMay 19, 2023

DUB: Discrete Unit Back-translation for Speech Translation

arXiv:2305.11411v1236 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of making speech translation as effective as text-based machine translation, which is incremental by applying back-translation to a new representation method.

The paper tackled the problem of improving speech-to-text translation (ST) by bridging the modality gap between speech and text, proposing Discrete Unit Back-translation (DUB) to apply machine translation techniques to ST, resulting in an average boost of 5.5 BLEU on MuST-C datasets and comparable performance in low-resource scenarios.

How can speech-to-text translation (ST) perform as well as machine translation (MT)? The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST. Recently, the approach of representing speech with unsupervised discrete units yields a new way to ease the modality problem. This motivates us to propose Discrete Unit Back-translation (DUB) to answer two questions: (1) Is it better to represent speech with discrete units than with continuous features in direct ST? (2) How much benefit can useful MT techniques bring to ST? With DUB, the back-translation technique can successfully be applied on direct ST and obtains an average boost of 5.5 BLEU on MuST-C En-De/Fr/Es. In the low-resource language scenario, our method achieves comparable performance to existing methods that rely on large-scale external data. Code and models are available at https://github.com/0nutation/DUB.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes