CLApr 4, 2019

Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis

arXiv:1904.02373v146 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more expressive and controllable speech synthesis for applications in voice assistants and media, representing an incremental improvement over existing methods.

The paper tackled the problem of independently controlling specific speech features in synthesis by introducing a multi-reference Tacotron with intercross training, achieving individual style control and transfer as demonstrated in experiments.

Speech style control and transfer techniques aim to enrich the diversity and expressiveness of synthesized speech. Existing approaches model all speech styles into one representation, lacking the ability to control a specific speech feature independently. To address this issue, we introduce a novel multi-reference structure to Tacotron and propose intercross training approach, which together ensure that each sub-encoder of the multi-reference encoder independently disentangles and controls a specific style. Experimental results show that our model is able to control and transfer desired speech styles individually.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes