SD CL ASJun 12, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe

arXiv:2406.08641v122.227 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more comprehensive benchmarking of multilingual speech models, though it is incremental as it builds on an existing benchmark.

The paper introduces ML-SUPERB 2.0, a benchmark for evaluating pre-trained speech models across various configurations, finding performance improvements over the original but noting that results depend on downstream model design and show large variations between languages and datasets.

ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance.

View on arXiv PDF

Similar