CLMar 21, 2022

XTREME-S: Evaluating Cross-lingual Speech Representations

Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan Van Esch, Vera Axelrod

Stanford

arXiv:2203.10752v35.027 citationsh-index: 56

Originality Synthesis-oriented

AI Analysis

This benchmark simplifies evaluation and catalyzes research in universal speech representation learning for multilingual speech processing.

The authors introduced XTREME-S, a benchmark for evaluating cross-lingual speech representations across 102 languages and 4 task families, establishing baselines using XLS-R and mSLAM.

We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s.

View on arXiv PDF

Similar