SD LGMay 29, 2017

On Residual CNN in text-dependent speaker verification task

Egor Malykh, Sergey Novoselov, Oleg Kudashev

arXiv:1705.10134v21.6

Originality Synthesis-oriented

AI Analysis

This work addresses speaker verification for security or authentication, but it is incremental as it builds on existing deep learning methods without surpassing the baseline alone.

The authors tackled text-dependent speaker verification by applying a residual CNN to spectrograms, achieving a 5.23% ERR on RSR2015, and a fusion with the baseline system improved performance by 18% relative.

Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.

View on arXiv PDF

Similar