Spaiche: Extending State-of-the-Art ASR Models to Swiss German Dialects
This work addresses ASR for Swiss German speakers, but it is incremental as it builds on existing models and datasets.
The paper tackles the problem of improving automatic speech recognition (ASR) for Swiss German dialects, a low-resource language, by fine-tuning OpenAI's Whisper model with a novel semantic loss, achieving state-of-the-art results on Swiss German datasets.
Recent breakthroughs in NLP largely increased the presence of ASR systems in our daily lives. However, for many low-resource languages, ASR models still need to be improved due in part to the difficulty of acquiring pertinent data. This project aims to help advance research in ASR models for Swiss German dialects, by providing insights about the performance of state-of-the-art ASR models on recently published Swiss German speech datasets. We propose a novel loss that takes into account the semantic distance between the predicted and the ground-truth labels. We outperform current state-of-the-art results by fine-tuning OpenAI's Whisper model on Swiss-German datasets.