On Residual CNN in text-dependent speaker verification task
This work addresses speaker verification for security or authentication, but it is incremental as it builds on existing deep learning methods without surpassing the baseline alone.
The authors tackled text-dependent speaker verification by applying a residual CNN to spectrograms, achieving a 5.23% ERR on RSR2015, and a fusion with the baseline system improved performance by 18% relative.
Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.