CLLGSDASMay 27, 2023

Translatotron 3: Speech to Speech Translation with Monolingual Data

arXiv:2305.17547v330 citations
Originality Incremental advance
AI Analysis

It addresses the problem of speech-to-speech translation without paired data for researchers and practitioners, offering an incremental advance over prior methods.

The paper tackles unsupervised direct speech-to-speech translation using only monolingual data, achieving an 18.14 BLEU point improvement over a baseline cascade system on a Spanish-English dataset.

This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-speech translation from monolingual speech-text datasets by combining masked autoencoder, unsupervised embedding mapping, and back-translation. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system, reporting $18.14$ BLEU points improvement on the synthesized Unpaired-Conversational dataset. In contrast to supervised approaches that necessitate real paired data, or specialized modeling to replicate para-/non-linguistic information such as pauses, speaking rates, and speaker identity, Translatotron 3 showcases its capability to retain it. Audio samples can be found at http://google-research.github.io/lingvo-lab/translatotron3

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes