CLApr 4, 2019

Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis

Yanyao Bian, Changbin Chen, Yongguo Kang, Zhenglin Pan

arXiv:1904.02373v13.946 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more expressive and controllable speech synthesis for applications in voice assistants and media, representing an incremental improvement over existing methods.

The paper tackled the problem of independently controlling specific speech features in synthesis by introducing a multi-reference Tacotron with intercross training, achieving individual style control and transfer as demonstrated in experiments.

Speech style control and transfer techniques aim to enrich the diversity and expressiveness of synthesized speech. Existing approaches model all speech styles into one representation, lacking the ability to control a specific speech feature independently. To address this issue, we introduce a novel multi-reference structure to Tacotron and propose intercross training approach, which together ensure that each sub-encoder of the multi-reference encoder independently disentangles and controls a specific style. Experimental results show that our model is able to control and transfer desired speech styles individually.

View on arXiv PDF

Similar