CLMMSDASMar 30, 2022

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

arXiv:2203.16028v27 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of accurately identifying disfluencies in transcripts for applications like speech processing, representing an incremental improvement by integrating structured information into existing paradigms.

The paper tackles disfluency detection in spoken utterances by proposing a novel architecture that combines contextual information from transformers with structured information from dependency trees using graph convolutional networks, achieving state-of-the-art results on the English Switchboard dataset and outperforming prior methods by a significant margin.

Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub (https://github.com/Sreyan88/Disfluency-Detection-with-Span-Classification)

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes