ASCLOct 19, 2024

End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach

arXiv:2410.16330v17 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses speech recognition for speakers of Northern Kurdish, an under-resourced language, but is incremental as it applies existing methods to new data.

This paper tackled the problem of automatic speech recognition for the low-resource Northern Kurdish language by fine-tuning the pre-trained Whisper model, achieving a Word Error Rate of 10.5% and Character Error Rate of 5.7% on a test set.

Automatic Speech Recognition (ASR) for low-resource languages remains a challenging task due to limited training data. This paper introduces a comprehensive study exploring the effectiveness of Whisper, a pre-trained ASR model, for Northern Kurdish (Kurmanji) an under-resourced language spoken in the Middle East. We investigate three fine-tuning strategies: vanilla, specific parameters, and additional modules. Using a Northern Kurdish fine-tuning speech corpus containing approximately 68 hours of validated transcribed data, our experiments demonstrate that the additional module fine-tuning strategy significantly improves ASR accuracy on a specialized test set, achieving a Word Error Rate (WER) of 10.5% and Character Error Rate (CER) of 5.7% with Whisper version 3. These results underscore the potential of sophisticated transformer models for low-resource ASR and emphasize the importance of tailored fine-tuning techniques for optimal performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes