SDLGOct 20, 2015

Max-margin Metric Learning for Speaker Recognition

arXiv:1510.05940v212 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of suboptimal Gaussian assumptions and indirect objectives in speaker recognition for researchers and practitioners, representing an incremental improvement.

The paper tackles the limitations of PLDA in speaker recognition by proposing a max-margin metric learning approach that directly optimizes for discriminating speakers and imposters, achieving comparable or better performance on the SRE08 core test.

Probabilistic linear discriminant analysis (PLDA) is a popular normalization approach for the i-vector model, and has delivered state-of-the-art performance in speaker recognition. A potential problem of the PLDA model, however, is that it essentially assumes Gaussian distributions over speaker vectors, which is not always true in practice. Additionally, the objective function is not directly related to the goal of the task, e.g., discriminating true speakers and imposters. In this paper, we propose a max-margin metric learning approach to solve the problems. It learns a linear transform with a criterion that the margin between target and imposter trials are maximized. Experiments conducted on the SRE08 core test show that compared to PLDA, the new approach can obtain comparable or even better performance, though the scoring is simply a cosine computation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes