Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection
This addresses the problem of identifying speaker changes in dialog transcripts for applications like processing missing speaker data, with incremental improvements over existing neural methods.
The paper tackles text-based speaker change detection by formulating it as a matching problem and proposing a hierarchical RNN with static sentence-level attention, achieving significantly better performance than non-attention neural networks and feature-based approaches.
Speaker change detection (SCD) is an important task in dialog modeling. Our paper addresses the problem of text-based SCD, which differs from existing audio-based studies and is useful in various scenarios, for example, processing dialog transcripts where speaker identities are missing (e.g., OpenSubtitle), and enhancing audio SCD with textual information. We formulate text-based SCD as a matching problem of utterances before and after a certain decision point; we propose a hierarchical recurrent neural network (RNN) with static sentence-level attention. Experimental results show that neural networks consistently achieve better performance than feature-based approaches, and that our attention-based model significantly outperforms non-attention neural networks.