SD AI ASJul 22, 2025

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

Eduardo Pacheco, Atila Orhon, Berkin Durmus, Blaise Munyampirwa, Andrey Leonov

arXiv:2507.16136v21 citationsh-index: 2Has CodeINTERSPEECH

Originality Synthesis-oriented

AI Analysis

This provides a standardized tool for researchers and practitioners in speech processing to compare speaker diarization systems more reliably, though it is incremental as it builds on existing datasets and methods.

The authors tackled the problem of inconsistent evaluation in speaker diarization by introducing SDBench, a benchmark suite that integrates 13 datasets and enables reproducible analysis, and demonstrated its efficacy by developing SpeakerKit, which achieved a 9.6x speed improvement over Pyannote v3 with comparable error rates.

Even state-of-the-art speaker diarization systems exhibit high variance in error rates across different datasets, representing numerous use cases and domains. Furthermore, comparing across systems requires careful application of best practices such as dataset splits and metric definitions to allow for apples-to-apples comparison. We propose SDBench (Speaker Diarization Benchmark), an open-source benchmark suite that integrates 13 diverse datasets with built-in tooling for consistent and fine-grained analysis of speaker diarization performance for various on-device and server-side systems. SDBench enables reproducible evaluation and easy integration of new systems over time. To demonstrate the efficacy of SDBench, we built SpeakerKit, an inference efficiency-focused system built on top of Pyannote v3. SDBench enabled rapid execution of ablation studies that led to SpeakerKit being 9.6x faster than Pyannote v3 while achieving comparable error rates. We benchmark 6 state-of-the-art systems including Deepgram, AWS Transcribe, and Pyannote AI API, revealing important trade-offs between accuracy and speed.

View on arXiv PDF

Similar