SD ASAug 17, 2019

JVS corpus: free Japanese multi-speaker voice corpus

Shinnosuke Takamichi, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, Hiroshi Saruwatari

arXiv:1908.06248v117.9102 citations

Originality Synthesis-oriented

AI Analysis

This provides a free, multi-speaker dataset for researchers and companies working on tasks like voice conversion and multi-speaker modeling, though it is incremental as it builds on prior work like the JSUT corpus.

The authors tackled the need for accessible Japanese voice data for speech synthesis research by constructing the JVS corpus, which includes 100 speakers with 30 hours of voice data across three styles, including 22 hours of parallel normal voices.

Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we are developing Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In 2017, we released the JSUT corpus, which contains 10 hours of reading-style speech uttered by a single speaker, for end-to-end text-to-speech synthesis. For more general use in speech synthesis research, e.g., voice conversion and multi-speaker modeling, in this paper, we construct the JVS corpus, which contains voice data of 100 speakers in three styles (normal, whisper, and falsetto). The corpus contains 30 hours of voice data including 22 hours of parallel normal voices. This paper describes how we designed the corpus and summarizes the specifications. The corpus is available at our project page.

View on arXiv PDF

Similar