AS SDAug 7, 2020

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

arXiv:2008.03024v16.66 citations

Originality Incremental advance

AI Analysis

This work addresses robustness issues in speaker verification for applications like security and authentication, but it is incremental as it builds on existing deep learning-based embedding methods.

The paper tackled the problem of performance degradation in speaker verification under varying conditions like recording devices and emotions by proposing a fully supervised training method for disentangling speaker embeddings from nuisance attributes, achieving robustness to channel and emotional variability as demonstrated on RSR2015 and VoxCeleb1 datasets.

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.

View on arXiv PDF

Similar