Remarks on Optimal Scores for Speaker Recognition
This work provides a theoretical foundation for speaker recognition scores, which is incremental as it formalizes existing empirical methods.
The paper establishes the theory of optimal scores for speaker recognition, showing that minimum Bayes risk decisions can be based on a normalized likelihood score, which is equivalent to PLDA likelihood ratio under linear Gaussian models and approximated by cosine and Euclidean distances.
In this article, we first establish the theory of optimal scores for speaker recognition. Our analysis shows that the minimum Bayes risk (MBR) decisions for both the speaker identification and speaker verification tasks can be based on a normalized likelihood (NL). When the underlying generative model is a linear Gaussian, the NL score is mathematically equivalent to the PLDA likelihood ratio, and the empirical scores based on cosine distance and Euclidean distance can be seen as approximations of this linear Gaussian NL score under some conditions. We discuss a number of properties of the NL score and perform a simple simulation experiment to demonstrate the properties of the NL score.