ASLGSDMLSep 17, 2018

Generative x-vectors for text-independent speaker verification

arXiv:1809.06798v116 citations
Originality Incremental advance
AI Analysis

This work addresses speaker verification for security and authentication applications, offering an incremental improvement over existing fusion methods.

The authors tackled the problem of improving speaker verification performance by proposing generative x-vectors, which combine complementary information from i-vector and x-vector systems, resulting in considerably better performance on the NIST SRE 2010 dataset, especially for long-duration utterances.

Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems. The fusion of these systems provides improved performance benefiting both from the discriminatively trained x-vectors and generative i-vectors capturing distinct speaker characteristics. In this paper, we propose a novel method to include the complementary information of i-vector and x-vector, that is called generative x-vector. The generative x-vector utilizes a transformation model learned from the i-vector and x-vector representations of the background data. Canonical correlation analysis is applied to derive this transformation model, which is later used to transform the standard x-vectors of the enrollment and test segments to the corresponding generative x-vectors. The SV experiments performed on the NIST SRE 2010 dataset demonstrate that the system using generative x-vectors provides considerably better performance than the baseline i-vector and x-vector systems. Furthermore, the generative x-vectors outperform the fusion of i-vector and x-vector systems for long-duration utterances, while yielding comparable results for short-duration utterances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes