AS AI SDSep 15, 2024

Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection

Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa Gorno Tempini, Jiachen Lian, Gopala Anumanchipalli

arXiv:2409.09621v19.717 citationsh-index: 98Has Code

Originality Incremental advance

AI Analysis

This addresses the need for scalable and generalizable dysfluency detection tools for speech processing applications, representing a strong specific gain.

The paper tackles the problem of detecting dysfluencies like stuttering across languages by proposing Stutter-Solver, an end-to-end framework that achieves state-of-the-art performance on all available dysfluency corpora.

Current de-facto dysfluency modeling methods utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver

View on arXiv PDF Code

Similar