CLASMar 1, 2023

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Meta AI
arXiv:2303.00628v256 citationsh-index: 41Has Code
Originality Synthesis-oriented
AI Analysis

This provides a new benchmark for researchers in audio-visual speech processing, though it is incremental as it extends existing datasets to multilingual and translation tasks.

The authors tackled the problem of building robust speech recognition and translation models by introducing MuAViC, a multilingual audio-visual corpus with 1200 hours of speech in 9 languages, which baseline results show is effective for this purpose.

We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages. It is fully transcribed and covers 6 English-to-X translation as well as 6 X-to-English translation directions. To the best of our knowledge, this is the first open benchmark for audio-visual speech-to-text translation and the largest open benchmark for multilingual audio-visual speech recognition. Our baseline results show that MuAViC is effective for building noise-robust speech recognition and translation models. We make the corpus available at https://github.com/facebookresearch/muavic.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes