MUSCAT: MUltilingual, SCientific ConversATion Benchmark
This benchmark fills a gap in evaluating multilingual ASR for realistic mixed-language scenarios, but the contribution is primarily dataset creation rather than a new method.
The authors introduce MUSCAT, a benchmark for evaluating multilingual ASR systems on mixed-language input, scientific vocabulary, and code-switching. Experiments show that current state-of-the-art ASR systems still struggle on this dataset, indicating an open challenge.
The goal of multilingual speech technology is to facilitate seamless communication between individuals speaking different languages, creating the experience as though everyone were a multilingual speaker. To create this experience, speech technology needs to address several challenges: Handling mixed multilingual input, specific vocabulary, and code-switching. However, there is currently no dataset benchmarking this situation. We propose a new benchmark to evaluate current Automatic Speech Recognition (ASR) systems, whether they are able to handle these challenges. The benchmark consists of bilingual discussions on scientific papers between multiple speakers, each conversing in a different language. We provide a standard evaluation framework, beyond Word Error Rate (WER) enabling consistent comparison of ASR performance across languages. Experimental results demonstrate that the proposed dataset is still an open challenge for state-of-the-art ASR systems. The dataset is available in https://huggingface.co/datasets/goodpiku/muscat-eval \\ \newline \Keywords{multilingual, speech recognition, audio segmentation, speaker diarization}