SDJan 22, 2018

Identifying Speakers Using Their Emotion Cues

arXiv:1801.07054v127 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speaker identification for audio processing applications, but it is incremental as it builds on existing methods like HMMs and SPHMMs.

The paper tackles speaker identification by incorporating emotional cues, achieving an average performance of 79.92% with a two-stage recognizer, which is a significant improvement over a one-stage recognizer at 71.58%.

This paper addresses the formulation of a new speaker identification approach which employs knowledge of emotional content of speaker information. Our proposed approach in this work is based on a two-stage recognizer that combines and integrates both emotion recognizer and speaker recognizer into one recognizer. The proposed approach employs both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. In the experiments, six emotions are considered including neutral, angry, sad, happy, disgust and fear. Our results show that average speaker identification performance based on the proposed two-stage recognizer is 79.92% with a significant improvement over a one-stage recognizer with an identification performance of 71.58%. The results obtained based on the proposed approach are close to those achieved in subjective evaluation by human listeners.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes