AS LG SD MLSep 17, 2018

Generative x-vectors for text-independent speaker verification

Longting Xu, Rohan Kumar Das, Emre Yılmaz, Jichen Yang, Haizhou Li

arXiv:1809.06798v15.916 citations

Originality Incremental advance

AI Analysis

This work addresses speaker verification for security and authentication applications, offering an incremental improvement over existing fusion methods.

The authors tackled the problem of improving speaker verification performance by proposing generative x-vectors, which combine complementary information from i-vector and x-vector systems, resulting in considerably better performance on the NIST SRE 2010 dataset, especially for long-duration utterances.

Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems. The fusion of these systems provides improved performance benefiting both from the discriminatively trained x-vectors and generative i-vectors capturing distinct speaker characteristics. In this paper, we propose a novel method to include the complementary information of i-vector and x-vector, that is called generative x-vector. The generative x-vector utilizes a transformation model learned from the i-vector and x-vector representations of the background data. Canonical correlation analysis is applied to derive this transformation model, which is later used to transform the standard x-vectors of the enrollment and test segments to the corresponding generative x-vectors. The SV experiments performed on the NIST SRE 2010 dataset demonstrate that the system using generative x-vectors provides considerably better performance than the baseline i-vector and x-vector systems. Furthermore, the generative x-vectors outperform the fusion of i-vector and x-vector systems for long-duration utterances, while yielding comparable results for short-duration utterances.

View on arXiv PDF

Similar