SDLGMay 29, 2017

On Residual CNN in text-dependent speaker verification task

arXiv:1705.10134v2
Originality Synthesis-oriented
AI Analysis

This work addresses speaker verification for security or authentication, but it is incremental as it builds on existing deep learning methods without surpassing the baseline alone.

The authors tackled text-dependent speaker verification by applying a residual CNN to spectrograms, achieving a 5.23% ERR on RSR2015, and a fusion with the baseline system improved performance by 18% relative.

Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes