SD LG ASApr 7, 2019

VAE-based regularization for deep speaker embedding

arXiv:1904.03617v18.821 citations

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in speaker recognition systems for applications like authentication, but it is incremental as it builds on existing VAE and PLDA techniques.

The paper tackled the problem of non-Gaussian deep speaker embeddings degrading PLDA scoring performance in speaker recognition by proposing a VAE-based regularization method that transforms embeddings into a more Gaussian latent space, resulting in improved suitability for PLDA scoring.

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors') are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring.

View on arXiv PDF

Similar