Improving Short Utterance PLDA Speaker Verification using SUV Modelling and Utterance Partitioning Approach
This addresses speaker verification accuracy degradation with short utterances, an incremental improvement for speech processing applications.
The paper tackles short utterance speaker verification by partitioning long enrollment utterances into multiple short segments and averaging their i-vectors, which improves Gaussian PLDA performance. Combined with short utterance variance modeling, this approach achieves relative improvements of 9% and 16% in equal error rate on NIST 2008 and 2010 benchmarks.
This paper analyses the short utterance probabilistic linear discriminant analysis (PLDA) speaker verification with utterance partitioning and short utterance variance (SUV) modelling approaches. Experimental studies have found that instead of using single long-utterance as enrolment data, if long enrolled utterance is partitioned into multiple short utterances and average of short utterance i-vectors is used as enrolled data, that improves the Gaussian PLDA (GPLDA) speaker verification. This is because short utterance i-vectors have speaker, session and utterance variations, and utterance-partitioning approach compensates the utterance variation. Subsequently, SUV-PLDA is also studied with utterance partitioning approach, and utterance partitioning-based SUV-GPLDA system shows relative improvement of 9% and 16% in EER for NIST 2008 and NIST 2010 truncated 10sec-10sec evaluation condition as utterance partitioning approach compensates the utterance variation and SUV modelling approach compensates the mismatch between full-length development data and short-length evaluation data.