SD CL ASMay 20, 2023

EE-TTS: Emphatic Expressive TTS with Linguistic Information

Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun

arXiv:2305.12107v29.56 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of generating more natural and expressive synthetic speech for TTS applications, representing an incremental improvement over previous methods.

The paper tackled the challenge of producing highly expressive speech in text-to-speech (TTS) systems by focusing on emphasis, proposing EE-TTS which uses multi-level linguistic information to enhance expressiveness, resulting in MOS improvements of 0.49 in expressiveness and 0.67 in naturalness compared to baselines.

While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To resolve this matter, we propose Emphatic Expressive TTS (EE-TTS), which leverages multi-level linguistic information from syntax and semantics. EE-TTS contains an emphasis predictor that can identify appropriate emphasis positions from text and a conditioned acoustic model to synthesize expressive speech with emphasis and linguistic information. Experimental results indicate that EE-TTS outperforms baseline with MOS improvements of 0.49 and 0.67 in expressiveness and naturalness. EE-TTS also shows strong generalization across different datasets according to AB test results.

View on arXiv PDF

Similar