Domain adaptation based Speaker Recognition on Short Utterances
This work addresses speaker recognition challenges for short utterances, which is important for applications like voice assistants, but it is incremental as it builds on existing PLDA and IDV methods.
The paper investigates how speaker verification performance degrades with short utterances and shows that in-domain PLDA improves EER and DCF by over 28% compared to out-domain PLDA for full-length utterances, but this gain decreases with shorter utterances. It introduces a modified IDV compensation method that improves out-domain PLDA by 26% and 14% for different datasets, though gains also reduce with shorter utterances.
This paper explores how the in- and out-domain probabilistic linear discriminant analysis (PLDA) speaker verification behave when enrolment and verification lengths are reduced. Experiment studies have found that when full-length utterance is used for evaluation, in-domain PLDA approach shows more than 28% improvement in EER and DCF values over out-domain PLDA approach and when short utterances are used for evaluation, the performance gain of in-domain speaker verification reduces at an increasing rate. Novel modified inter dataset variability (IDV) compensation is used to compensate the mismatch between in- and out-domain data and IDV-compensated out-domain PLDA shows respectively 26% and 14% improvement over out-domain PLDA speaker verification when SWB and NIST data are respectively used for S normalization. When the evaluation utterance length is reduced, the performance gain by IDV also reduces as short utterance evaluation data i-vectors have more variations due to phonetic variations when compared to the dataset mismatch between in- and out-domain data.