CL AI SD ASDec 14, 2024

Efficient Adaptation of Multilingual Models for Japanese ASR

Mark Bajo, Haruka Fukukawa, Ryuji Morita, Yuma Ogasawara

arXiv:2412.10705v12 citationsh-index: 1

Originality Incremental advance

AI Analysis

This provides a scalable solution for enhancing ASR in resource-constrained environments and languages with complex writing systems like Japanese, though it is incremental as it builds on existing fine-tuning methods.

This study tackled the problem of improving Japanese automatic speech recognition (ASR) by fine-tuning the multilingual Whisper-Tiny model, reducing its character error rate from 32.7 to 14.7 with end-to-end training and outperforming Whisper-Base at 20.2.

This study explores fine-tuning multilingual ASR (Automatic Speech Recognition) models, specifically OpenAI's Whisper-Tiny, to improve performance in Japanese. While multilingual models like Whisper offer versatility, they often lack precision in specific languages. Conversely, monolingual models like ReazonSpeech excel in language-specific tasks but are less adaptable. Using Japanese-specific datasets and Low-Rank Adaptation (LoRA) along with end-to-end (E2E) training, we fine-tuned Whisper-Tiny to bridge this gap. Our results show that fine-tuning reduced Whisper-Tiny's Character Error Rate (CER) from 32.7 to 20.8 with LoRA and to 14.7 with end-to-end fine-tuning, surpassing Whisper-Base's CER of 20.2. However, challenges with domain-specific terms remain, highlighting the need for specialized datasets. These findings demonstrate that fine-tuning multilingual models can achieve strong language-specific performance while retaining their flexibility. This approach provides a scalable solution for improving ASR in resource-constrained environments and languages with complex writing systems like Japanese.

View on arXiv PDF

Similar