BER: Balanced Error Rate For Speaker Diarization
This work provides a more balanced evaluation metric for speaker diarization, which is important for researchers and practitioners in speech processing, though it is incremental as it builds on existing metrics.
The authors tackled the problem of evaluating speaker diarization by proposing a Balanced Error Rate (BER) metric that addresses the limitations of existing metrics like DER and JER, which tend to overlook errors in short segments and less-talked speakers, resulting in a more comprehensive evaluation method.
DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones. Short segments, e.g., `yes' or `no,' still have semantic information. Besides, DER overlooks errors in less-talked speakers. Although JER balances speaker errors, it still suffers from the same dilemma. Considering all those aspects, duration error, segment error, and speaker-weighted error constituting a complete diarization evaluation, we propose a Balanced Error Rate (BER) to evaluate speaker diarization. First, we propose a segment-level error rate (SER) via connected sub-graphs and adaptive IoU threshold to get accurate segment matching. Second, to evaluate diarization in a unified way, we adopt a speaker-specific harmonic mean between duration and segment, followed by a speaker-weighted average. Third, we analyze our metric via the modularized system, EEND, and the multi-modal method on real datasets. SER and BER are publicly available at https://github.com/X-LANCE/BER.