ASAISDSep 15, 2024

Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection

arXiv:2409.09621v117 citationsh-index: 98Has Code
Originality Incremental advance
AI Analysis

This addresses the need for scalable and generalizable dysfluency detection tools for speech processing applications, representing a strong specific gain.

The paper tackles the problem of detecting dysfluencies like stuttering across languages by proposing Stutter-Solver, an end-to-end framework that achieves state-of-the-art performance on all available dysfluency corpora.

Current de-facto dysfluency modeling methods utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes