SDLGMLMay 25, 2017

Investigation of Using VAE for i-Vector Speaker Verification

arXiv:1705.09185v14 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for speaker verification systems, potentially enhancing accuracy in applications like security or voice assistants.

The paper tackles speaker verification by investigating a VAE-based system for i-vector speaker recognition, showing that it provides speaker embeddings and performs close to diagonal PLDA on NIST SRE 2010 data.

New system for i-vector speaker recognition based on variational autoencoder (VAE) is investigated. VAE is a promising approach for developing accurate deep nonlinear generative models of complex data. Experiments show that VAE provides speaker embedding and can be effectively trained in an unsupervised manner. LLR estimate for VAE is developed. Experiments on NIST SRE 2010 data demonstrate its correctness. Additionally, we show that the performance of VAE-based system in the i-vectors space is close to that of the diagonal PLDA. Several interesting results are also observed in the experiments with $β$-VAE. In particular, we found that for $β\ll 1$, VAE can be trained to capture the features of complex input data distributions in an effective way, which is hard to obtain in the standard VAE ($β=1$).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes