ASSDSPJan 14, 2020

Gaussian speaker embedding learning for text-independent speaker verification

arXiv:2001.04585v11 citations
AI Analysis

This work addresses a key issue in speaker verification for applications like security and authentication, but it appears incremental as it builds on the dominant x-vector/PLDA framework.

The paper tackled the problem of extracting x-vectors suitable for PLDA in text-independent speaker verification by proposing a Gaussian noise constrained network (GNCN) with multi-task learning, achieving effective results on the SITW database.

The x-vector maps segments of arbitrary duration to vectors of fixed dimension using deep neural network. Combined with the probabilistic linear discriminant analysis (PLDA) backend, the x-vector/PLDA has become the dominant framework in text-independent speaker verification. Nevertheless, how to extract the x-vector appropriate for the PLDA backend is a key problem. In this paper, we propose a Gaussian noise constrained network (GNCN) to extract xvector, which adopts a multi-task learning strategy with the primary task classifying the speakers and the auxiliary task just fitting the Gaussian noises. Experiments are carried out using the SITW database. The results demonstrate the effectiveness of our proposed method

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes