PhoWhisper: Automatic Speech Recognition for Vietnamese
This work addresses speech recognition for Vietnamese speakers, but it is incremental as it applies an existing method to new data.
The authors tackled Vietnamese automatic speech recognition by fine-tuning the Whisper model on an 844-hour dataset with diverse accents, achieving state-of-the-art performance on benchmark datasets.
We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com/VinAIResearch/PhoWhisper