ASCLLGSDAug 20, 2020

Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

arXiv:2008.09483v114 citations
Originality Incremental advance
AI Analysis

This work addresses the under-explored area of expressive speech synthesis for applications like amusement-controlled TTS, though it is incremental as it builds on existing TTS methods.

The paper tackled the problem of synthesizing nonverbal expressions, specifically laughter, by proposing an audio laughter synthesis system based on sequence-to-sequence TTS and transfer learning, achieving higher perceived naturalness compared to an HMM-based method in listening tests.

Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listening test, comparing its performance to an HMM-based laughter synthesis one and assess that it reaches higher perceived naturalness. Our solution is a first step towards a TTS system that would be able to synthesize speech with a control on amusement level with laughter integration.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes