CLApr 11, 2019

A high quality and phonetic balanced speech corpus for Vietnamese

Pham Ngoc Phuong, Quoc Truong Do, Luong Chi Mai

arXiv:1904.05569v10.24 citations

Originality Synthesis-oriented

AI Analysis

This provides a valuable resource for researchers and developers working on Vietnamese speech technology, though it is incremental as it builds on existing corpus creation methods.

The authors tackled the lack of a high-quality, phonetically balanced speech corpus for Vietnamese by creating one with 5400 utterances from 12 speakers, designed to support speech synthesis and adaptation.

This paper presents a high quality Vietnamese speech corpus that can be used for analyzing Vietnamese speech characteristic as well as building speech synthesis models. The corpus consists of 5400 clean-speech utterances spoken by 12 speakers including 6 males and 6 females. The corpus is designed with phonetic balanced in mind so that it can be used for speech synthesis, especially, speech adaptation approaches. Specifically, all speakers utter a common dataset contains 250 phonetic balanced sentences. To increase the variety of speech context, each speaker also utters another 200 non-shared, phonetic-balanced sentences. The speakers are selected to cover a wide range of age and come from different regions of the North of Vietnam. The audios are recorded in a soundproof studio room, they are sampling at 48 kHz, 16 bits PCM, mono channel.

View on arXiv PDF

Similar