Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform
This provides a more accessible and modern tool for researchers and developers working on text-to-speech synthesis, though it is incremental as it builds upon the existing MaryTTS platform.
The authors tackled the problem of creating language and voice components for the MaryTTS text-to-speech platform by developing a new workflow that replaces the previous toolkit with an efficient, flexible process using modern build automation and cloud infrastructure, enabling support for new languages, custom voices, and state-of-the-art DNN-based synthesis.
We present a new workflow to create components for the MaryTTS text-to-speech synthesis platform, which is popular with researchers and developers, extending it to support new languages and custom synthetic voices. This workflow replaces the previous toolkit with an efficient, flexible process that leverages modern build automation and cloud-hosted infrastructure. Moreover, it is compatible with the updated MaryTTS architecture, enabling new features and state-of-the-art paradigms such as synthesis based on deep neural networks (DNNs). Like MaryTTS itself, the new tools are free, open source software (FOSS), and promote the use of open data.