ASSDNov 5, 2020

Multi-class Spectral Clustering with Overlaps for Speaker Diarization

arXiv:2011.02900v137 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurately identifying speakers in recordings with overlapping speech, which is crucial for applications like meeting transcription, but the approach is incremental as it builds on existing overlap detection and clustering techniques.

The paper tackles speaker diarization with overlapping speech by developing a spectral clustering method informed by an overlap detector, achieving a test diarization error rate of 24.0% on the AMI meeting corpus, which is a 15.2% relative improvement over a strong baseline.

This paper describes a method for overlap-aware speaker diarization. Given an overlap detector and a speaker embedding extractor, our method performs spectral clustering of segments informed by the output of the overlap detector. This is achieved by transforming the discrete clustering problem into a convex optimization problem which is solved by eigen-decomposition. Thereafter, we discretize the solution by alternatively using singular value decomposition and a modified version of non-maximal suppression which is constrained by the output of the overlap detector. Furthermore, we detail an HMM-DNN based overlap detector which performs frame-level classification and enforces duration constraints through HMM state transitions. Our method achieves a test diarization error rate (DER) of 24.0% on the mixed-headset setting of the AMI meeting corpus, which is a relative improvement of 15.2% over a strong agglomerative hierarchical clustering baseline, and compares favorably with other overlap-aware diarization methods. Further analysis on the LibriCSS data demonstrates the effectiveness of the proposed method in high overlap conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes