AS CLMar 27, 2024

PhoWhisper: Automatic Speech Recognition for Vietnamese

Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

arXiv:2406.02555v110.831 citationsh-index: 7Has CodeTiny Papers @ ICLR

Originality Synthesis-oriented

AI Analysis

This work addresses speech recognition for Vietnamese speakers, but it is incremental as it applies an existing method to new data.

The authors tackled Vietnamese automatic speech recognition by fine-tuning the Whisper model on an 844-hour dataset with diverse accents, achieving state-of-the-art performance on benchmark datasets.

We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com/VinAIResearch/PhoWhisper

View on arXiv PDF Code

Similar