CLSDASNov 8, 2022

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

Meta AI
arXiv:2211.04508v1238 citationsh-index: 48
Originality Incremental advance
AI Analysis

This provides a valuable resource for researchers in speech translation, though it is incremental as it builds on existing mining techniques for a new domain.

The authors tackled the problem of limited multilingual speech-to-speech translation data by mining SpeechMatrix, a large-scale corpus from European Parliament recordings, resulting in 418,000 hours of speech across 136 language pairs and establishing baseline models that show gains from pre-training and Mixture-of-Experts scaling.

We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive baseline results on EuroParl-ST, VoxPopuli and FLEURS test sets. Enabled by the multilinguality of SpeechMatrix, we also explore multilingual speech-to-speech translation, a topic which was addressed by few other works. We also demonstrate that model pre-training and sparse scaling using Mixture-of-Experts bring large gains to translation performance. The mined data and models are freely available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes