CLAIApr 8, 2019

Giving Attention to the Unexpected: Using Prosody Innovations in Disfluency Detection

arXiv:1904.04388v11099 citations
Originality Incremental advance
AI Analysis

This work addresses disfluency detection for speech processing applications, but it is incremental as it builds on existing methods by adding prosodic features.

The paper tackled the problem of detecting disfluencies in spontaneous speech by integrating prosodic cues, which are often overlooked, and achieved gains over a high-accuracy text-only model.

Disfluencies in spontaneous speech are known to be associated with prosodic disruptions. However, most algorithms for disfluency detection use only word transcripts. Integrating prosodic cues has proved difficult because of the many sources of variability affecting the acoustic correlates. This paper introduces a new approach to extracting acoustic-prosodic cues using text-based distributional prediction of acoustic cues to derive vector z-score features (innovations). We explore both early and late fusion techniques for integrating text and prosody, showing gains over a high-accuracy text-only model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes