CLOct 28, 2017

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

Ryosuke Sonobe, Shinnosuke Takamichi, Hiroshi Saruwatari

arXiv:1711.00354v112.4182 citations

Originality Synthesis-oriented

AI Analysis

This provides a crucial resource for researchers and developers in speech synthesis, addressing a gap for Japanese language applications, though it is incremental as it fills a missing dataset rather than introducing new methods.

The authors tackled the lack of a free large-scale Japanese speech corpus for end-to-end speech synthesis by creating the JSUT corpus, which includes 10 hours of reading-style speech data with transcriptions covering all main pronunciations of daily-use Japanese characters.

Thanks to improvements in machine learning techniques including deep learning, a free large-scale speech corpus that can be shared between academic institutions and commercial companies has an important role. However, such a corpus for Japanese speech synthesis does not exist. In this paper, we designed a novel Japanese speech corpus, named the "JSUT corpus," that is aimed at achieving end-to-end speech synthesis. The corpus consists of 10 hours of reading-style speech data and its transcription and covers all of the main pronunciations of daily-use Japanese characters. In this paper, we describe how we designed and analyzed the corpus. The corpus is freely available online.

View on arXiv PDF

Similar