SDLGASMar 28, 2022

Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems

arXiv:2203.15106v13 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This work addresses the calibration problem for speaker verification systems, which is crucial for reliability in unknown acoustic conditions, but it is incremental as it compares existing and recent methods without introducing a fundamentally new approach.

The paper investigates various calibration methods for deep speaker embedding verification systems, finding that in-domain data avoids serious issues but a trade-off emerges otherwise, with adaptive s-norm generally stabilizing scores and improving performance, though novel methods have limits on some datasets.

Deep speaker embedding extractors have already become new state-of-the-art systems in the speaker verification field. However, the problem of verification score calibration for such systems often remains out of focus. An irrelevant score calibration leads to serious issues, especially in the case of unknown acoustic conditions, even if we use a strong speaker verification system in terms of threshold-free metrics. This paper presents an investigation over several methods of score calibration: a classical approach based on the logistic regression model; the recently presented magnitude estimation network MagnetO that uses activations from the pooling layer of the trained deep speaker extractor and generalization of such approach based on separate scale and offset prediction neural networks. An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system. The obtained results demonstrate that there are no serious problems if in-domain development data are used for calibration tuning. Otherwise, a trade-off between good calibration performance and threshold-free system quality arises. In most cases using adaptive s-norm helps to stabilize score distributions and to improve system performance. Meanwhile, some experiments demonstrate that novel approaches have their limits in score stabilization on several datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes