MLLGMar 24, 2018

Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors

arXiv:1803.09153v10.0033 citations
AI Analysis45

This work addresses the computational bottleneck for speaker recognition systems using i-vectors and x-vectors, offering an incremental improvement over existing methods.

The paper tackles the computational inefficiency of heavy-tailed PLDA (HT-PLDA) in speaker recognition by introducing a fast variational Bayes generative training algorithm, achieving similar accuracy to Gaussian PLDA with length normalization on datasets like SRE'10, SRE'16, and SITW.

The standard state-of-the-art backend for text-independent speaker recognizers that use i-vectors or x-vectors, is Gaussian PLDA (G-PLDA), assisted by a Gaussianization step involving length normalization. G-PLDA can be trained with both generative or discriminative methods. It has long been known that heavy-tailed PLDA (HT-PLDA), applied without length normalization, gives similar accuracy, but at considerable extra computational cost. We have recently introduced a fast scoring algorithm for a discriminatively trained HT-PLDA backend. This paper extends that work by introducing a fast, variational Bayes, generative training algorithm. We compare old and new backends, with and without length-normalization, with i-vectors and x-vectors, on SRE'10, SRE'16 and SITW.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes