AS CL LG SDAug 20, 2020

Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

Noé Tits, Kevin El Haddad, Thierry Dutoit

arXiv:2008.09483v15.114 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the under-explored area of expressive speech synthesis for applications like amusement-controlled TTS, though it is incremental as it builds on existing TTS methods.

The paper tackled the problem of synthesizing nonverbal expressions, specifically laughter, by proposing an audio laughter synthesis system based on sequence-to-sequence TTS and transfer learning, achieving higher perceived naturalness compared to an HMM-based method in listening tests.

Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listening test, comparing its performance to an HMM-based laughter synthesis one and assess that it reaches higher perceived naturalness. Our solution is a first step towards a TTS system that would be able to synthesize speech with a control on amusement level with laughter integration.

View on arXiv PDF Code

Similar