CLMar 20, 2018

Expressivity in TTS from Semantics and Pragmatics

arXiv:1803.07295v12 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more natural and expressive TTS in text and dialogue applications, though it appears incremental as it extends an existing system to new text types.

The authors tackled the problem of generating expressive text-to-speech (TTS) by analyzing text at phonetic, phonological, syntactic, and semantic levels, and incorporating pragmatically marked phrases for specialized intonational contours, resulting in a system that transforms text into poem-like structures with breath groups and stanzas for improved expressivity.

In this paper we present ongoing work to produce an expressive TTS reader that can be used both in text and dialogue applications. The system called SPARSAR has been used to read (English) poetry so far but it can now be applied to any text. The text is fully analyzed both at phonetic and phonological level, and at syntactic and semantic level. In addition, the system has access to a restricted list of typical pragmatically marked phrases and expressions that are used to convey specific discourse function and speech acts and need specialized intonational contours. The text is transformed into a poem-like structures, where each line corresponds to a Breath Group, semantically and syntactically consistent. Stanzas correspond to paragraph boundaries. Analogical parameters are related to ToBI theoretical indices but their number is doubled. In this paper, we concentrate on short stories and fables.

View on arXiv PDF

Similar