AICLSDASDec 1, 2024

A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario

arXiv:2412.00721v25 citationsh-index: 28
Originality Synthesis-oriented
AI Analysis

This work addresses ASR challenges for low-resource languages and code-switching scenarios, but it is incremental as it compares existing methods on new data.

This study tackled the problem of automatic speech recognition in low-resource and Mandarin-English code-switching scenarios by comparing LLM-based ASR systems against Whisper, finding that LLM-based ASR achieved a 12.8% relative gain over Whisper in low-resource settings, while Whisper performed better in code-switching ASR.

Large Language Models (LLMs) have showcased exceptional performance across diverse NLP tasks, and their integration with speech encoder is rapidly emerging as a dominant trend in the Automatic Speech Recognition (ASR) field. Previous works mainly concentrated on leveraging LLMs for speech recognition in English and Chinese. However, their potential for addressing speech recognition challenges in low resource settings remains underexplored. Hence, in this work, we aim to explore the capability of LLMs in low resource ASR and Mandarin-English code switching ASR. We also evaluate and compare the recognition performance of LLM-based ASR systems against Whisper model. Extensive experiments demonstrate that LLM-based ASR yields a relative gain of 12.8\% over the Whisper model in low resource ASR while Whisper performs better in Mandarin-English code switching ASR. We hope that this study could shed light on ASR for low resource scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes