Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors
This work addresses the computational bottleneck for speaker recognition systems using i-vectors and x-vectors, offering an incremental improvement over existing methods.
The paper tackles the computational inefficiency of heavy-tailed PLDA (HT-PLDA) in speaker recognition by introducing a fast variational Bayes generative training algorithm, achieving similar accuracy to Gaussian PLDA with length normalization on datasets like SRE'10, SRE'16, and SITW.
The standard state-of-the-art backend for text-independent speaker recognizers that use i-vectors or x-vectors, is Gaussian PLDA (G-PLDA), assisted by a Gaussianization step involving length normalization. G-PLDA can be trained with both generative or discriminative methods. It has long been known that heavy-tailed PLDA (HT-PLDA), applied without length normalization, gives similar accuracy, but at considerable extra computational cost. We have recently introduced a fast scoring algorithm for a discriminatively trained HT-PLDA backend. This paper extends that work by introducing a fast, variational Bayes, generative training algorithm. We compare old and new backends, with and without length-normalization, with i-vectors and x-vectors, on SRE'10, SRE'16 and SITW.