CLMar 20, 2018

Expressivity in TTS from Semantics and Pragmatics

arXiv:1803.07295v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more natural and expressive TTS in text and dialogue applications, though it appears incremental as it extends an existing system to new text types.

The authors tackled the problem of generating expressive text-to-speech (TTS) by analyzing text at phonetic, phonological, syntactic, and semantic levels, and incorporating pragmatically marked phrases for specialized intonational contours, resulting in a system that transforms text into poem-like structures with breath groups and stanzas for improved expressivity.

In this paper we present ongoing work to produce an expressive TTS reader that can be used both in text and dialogue applications. The system called SPARSAR has been used to read (English) poetry so far but it can now be applied to any text. The text is fully analyzed both at phonetic and phonological level, and at syntactic and semantic level. In addition, the system has access to a restricted list of typical pragmatically marked phrases and expressions that are used to convey specific discourse function and speech acts and need specialized intonational contours. The text is transformed into a poem-like structures, where each line corresponds to a Breath Group, semantically and syntactically consistent. Stanzas correspond to paragraph boundaries. Analogical parameters are related to ToBI theoretical indices but their number is doubled. In this paper, we concentrate on short stories and fables.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes