An improved uncertainty propagation method for robust i-vector based speaker recognition
This work addresses robust speaker recognition for noisy environments, but it is incremental as it builds on existing uncertainty propagation methods.
The paper tackles performance degradation in speaker recognition systems due to distorted speech by proposing a complete uncertainty propagation method that models uncertainty in both Baum-Welch statistics and i-vector derivation, achieving a 4% relative improvement in equal error rate on the NIST-SRE corpus compared to a baseline.
The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition. They have achieved promising results on small datasets such as YOHO and Wall Street Journal, but little or no improvement on the larger, highly variable NIST Speaker Recognition Evaluation (SRE) corpus. In this paper, we propose a complete uncertainty propagation method, whereby we model the effect of uncertainty both in the computation of unbiased Baum-Welch statistics and in the derivation of the posterior expectation of the i-vector. We conduct experiments on the NIST-SRE corpus mixed with real domestic noise and reverberation from the CHiME-2 corpus and preprocessed by multichannel speech enhancement. The proposed method improves the equal error rate (EER) by 4% relative compared to a conventional i-vector based speaker verification baseline. This is to be compared with previous methods which degrade performance.