Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning
This work addresses the under-explored area of expressive speech synthesis for applications like amusement-controlled TTS, though it is incremental as it builds on existing TTS methods.
The paper tackled the problem of synthesizing nonverbal expressions, specifically laughter, by proposing an audio laughter synthesis system based on sequence-to-sequence TTS and transfer learning, achieving higher perceived naturalness compared to an HMM-based method in listening tests.
Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listening test, comparing its performance to an HMM-based laughter synthesis one and assess that it reaches higher perceived naturalness. Our solution is a first step towards a TTS system that would be able to synthesize speech with a control on amusement level with laughter integration.