Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem
This addresses a computational bottleneck for researchers and practitioners in speech processing who need to combine multiple diarization hypotheses efficiently, though it is incremental as it builds on prior work.
The paper tackles the computational inefficiency of DOVER-Lap's label mapping for combining speaker diarization outputs by reformulating it as a graph partitioning problem, resulting in a modified algorithm that performs similarly while being tractable and providing approximation bounds.
We recently proposed DOVER-Lap, a method for combining overlap-aware speaker diarization system outputs. DOVER-Lap improved upon its predecessor DOVER by using a label mapping method based on globally-informed greedy search. In this paper, we analyze this label mapping in the framework of a maximum orthogonal graph partitioning problem, and present three inferences. First, we show that DOVER-Lap label mapping is exponential in the input size, which poses a challenge when combining a large number of hypotheses. We then revisit the DOVER label mapping algorithm and propose a modification which performs similar to DOVER-Lap while being computationally tractable. We also derive an approximation bound for the algorithm in terms of the maximum number of hypotheses speakers. Finally, we describe a randomized local search algorithm which provides a near-optimal $(1-ε)$-approximate solution to the problem with high probability. We empirically demonstrate the effectiveness of our methods on the AMI meeting corpus. Our code is publicly available: https://github.com/desh2608/dover-lap.