ASSDMay 9, 2018

Speaker Recognition using Deep Belief Networks

arXiv:1805.08865v114 citations
Originality Incremental advance
AI Analysis

This work addresses speaker recognition for audio processing applications, presenting an incremental improvement over existing methods.

The paper tackled speaker recognition by using deep belief networks (DBNs) to learn short-term spectral features from speech signals, combined with MFCC features, achieving a recognition accuracy of 0.95 compared to 0.90 with MFCC alone on the ELSDSR dataset.

Short time spectral features such as mel frequency cepstral coefficients(MFCCs) have been previously deployed in state of the art speaker recognition systems, however lesser heed has been paid to short term spectral features that can be learned by generative learning models from speech signals. Higher dimensional encoders such as deep belief networks (DBNs) could improve performance in speaker recognition tasks by better modelling the statistical structure of sound waves. In this paper, we use short term spectral features learnt from the DBN augmented with MFCC features to perform the task of speaker recognition. Using our features, we achieved a recognition accuracy of 0.95 as compared to 0.90 when using standalone MFCC features on the ELSDSR dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes