SDOct 11, 2018

Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments

Ismail Shahin, Ali Bou Nassif, Shibani Hamsa

arXiv:1810.04908v157 citations

Originality Incremental advance

AI Analysis

This addresses speaker identification for applications in noisy emotional settings, but it is incremental as it combines existing techniques.

The paper tackled speaker identification in emotional talking environments by proposing a cascaded Gaussian Mixture Model-Deep Neural Network classifier, which outperformed classical methods like MLP and SVM on Arabic and English datasets and achieved performance similar to human subjective assessment.

This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing, implementing and evaluating a new approach for speaker identification in emotional talking environments based on cascaded Gaussian Mixture Model-Deep Neural Network as a classifier. The results point out that the cascaded GMM-DNN classifier improves speaker identification performance at various emotions using two distinct speech databases: Emirati speech database (Arabic United Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS) English dataset. The proposed classifier outperforms classical classifiers such as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each dataset. Speaker identification performance that has been attained based on the cascaded GMM-DNN is similar to that acquired from subjective assessment by human listeners.

View on arXiv PDF

Similar