SDAIASJul 22, 2025

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

arXiv:2507.16136v21 citationsh-index: 2Has CodeINTERSPEECH
Originality Synthesis-oriented
AI Analysis

This provides a standardized tool for researchers and practitioners in speech processing to compare speaker diarization systems more reliably, though it is incremental as it builds on existing datasets and methods.

The authors tackled the problem of inconsistent evaluation in speaker diarization by introducing SDBench, a benchmark suite that integrates 13 datasets and enables reproducible analysis, and demonstrated its efficacy by developing SpeakerKit, which achieved a 9.6x speed improvement over Pyannote v3 with comparable error rates.

Even state-of-the-art speaker diarization systems exhibit high variance in error rates across different datasets, representing numerous use cases and domains. Furthermore, comparing across systems requires careful application of best practices such as dataset splits and metric definitions to allow for apples-to-apples comparison. We propose SDBench (Speaker Diarization Benchmark), an open-source benchmark suite that integrates 13 diverse datasets with built-in tooling for consistent and fine-grained analysis of speaker diarization performance for various on-device and server-side systems. SDBench enables reproducible evaluation and easy integration of new systems over time. To demonstrate the efficacy of SDBench, we built SpeakerKit, an inference efficiency-focused system built on top of Pyannote v3. SDBench enabled rapid execution of ablation studies that led to SpeakerKit being 9.6x faster than Pyannote v3 while achieving comparable error rates. We benchmark 6 state-of-the-art systems including Deepgram, AWS Transcribe, and Pyannote AI API, revealing important trade-offs between accuracy and speed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes