AS SDMar 14, 2018

Speaker Verification using Convolutional Neural Networks

arXiv:1803.05427v226 citations

Originality Incremental advance

AI Analysis

This work addresses speaker verification, a key problem in biometric security and voice authentication, with an incremental improvement over existing methods.

The paper tackles speaker verification by developing a novel Convolutional Neural Network architecture that captures speaker information while discarding non-speaker details, and it demonstrates that this method outperforms traditional approaches that create speaker models directly from background models.

In this paper, a novel Convolutional Neural Network architecture has been developed for speaker verification in order to simultaneously capture and discard speaker and non-speaker information, respectively. In training phase, the network is trained to distinguish between different speaker identities for creating the background model. One of the crucial parts is to create the speaker models. Most of the previous approaches create speaker models based on averaging the speaker representations provided by the background model. We overturn this problem by further fine-tuning the trained model using the Siamese framework for generating a discriminative feature space to distinguish between same and different speakers regardless of their identity. This provides a mechanism which simultaneously captures the speaker-related information and create robustness to within-speaker variations. It is demonstrated that the proposed method outperforms the traditional verification methods which create speaker models directly from the background model.

View on arXiv PDF

Similar