ASCLMar 27, 2024

PhoWhisper: Automatic Speech Recognition for Vietnamese

arXiv:2406.02555v127 citationsh-index: 7Has CodeTiny Papers @ ICLR
Originality Synthesis-oriented
AI Analysis

This work addresses speech recognition for Vietnamese speakers, but it is incremental as it applies an existing method to new data.

The authors tackled Vietnamese automatic speech recognition by fine-tuning the Whisper model on an 844-hour dataset with diverse accents, achieving state-of-the-art performance on benchmark datasets.

We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com/VinAIResearch/PhoWhisper

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes