AS CLSep 18, 2025

SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding

Bingsong Bai, Qihang Lu, Wenbing Yang, Zihan Sun, Yueran Hou, Peilei Jia, Songbai Pu, Ruibo Fu, Yingming Gao, Ya Li, Jun Gao

arXiv:2509.14946v34.34 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of limited and low-quality paralinguistic data for researchers and developers in speech generation and understanding, though it is incremental as it builds on existing methods for dataset creation.

The authors tackled the lack of high-quality public datasets for paralinguistic sounds in speech by proposing an automated framework to generate the SynParaSpeech dataset, which includes 6 categories and 118.75 hours of data with precise timestamps, improving synthesis and detection tasks.

Paralinguistic sounds, like laughter and sighs, are crucial for synthesizing more realistic and engaging speech. However, existing methods typically depend on proprietary datasets, while publicly available resources often suffer from incomplete speech, inaccurate or missing timestamps, and limited real-world relevance. To address these problems, we propose an automated framework for generating large-scale paralinguistic data and apply it to construct the SynParaSpeech dataset. The dataset comprises 6 paralinguistic categories with 118.75 hours of data and precise timestamps, all derived from natural conversational speech. Our contributions lie in introducing the first automated method for constructing large-scale paralinguistic datasets and releasing the SynParaSpeech corpus, which advances speech generation through more natural paralinguistic synthesis and enhances speech understanding by improving paralinguistic event detection. The dataset and audio samples are available at https://github.com/ShawnPi233/SynParaSpeech.

View on arXiv PDF Code

Similar