CLASMar 26, 2025

Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

arXiv:2503.20212v110 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for improved speech recognition in diverse Eastern languages, benefiting users and developers in those regions, though it is incremental as it extends an existing architecture.

The paper tackles the problem of automatic speech recognition for Eastern languages by introducing Dolphin, a large-scale multilingual model based on Whisper, which significantly outperforms current state-of-the-art open-source models across 40 Eastern languages and 22 Chinese dialects.

This report introduces Dolphin, a large-scale multilingual automatic speech recognition (ASR) model that extends the Whisper architecture to support a wider range of languages. Our approach integrates in-house proprietary and open-source datasets to refine and optimize Dolphin's performance. The model is specifically designed to achieve notable recognition accuracy for 40 Eastern languages across East Asia, South Asia, Southeast Asia, and the Middle East, while also supporting 22 Chinese dialects. Experimental evaluations show that Dolphin significantly outperforms current state-of-the-art open-source models across various languages. To promote reproducibility and community-driven innovation, we are making our trained models and inference source code publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes