SDLGASApr 7, 2019

VAE-based regularization for deep speaker embedding

arXiv:1904.03617v121 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in speaker recognition systems for applications like authentication, but it is incremental as it builds on existing VAE and PLDA techniques.

The paper tackled the problem of non-Gaussian deep speaker embeddings degrading PLDA scoring performance in speaker recognition by proposing a VAE-based regularization method that transforms embeddings into a more Gaussian latent space, resulting in improved suitability for PLDA scoring.

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors') are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes