SDASMar 31, 2018

Speaker Verification in Emotional Talking Environments based on Three-Stage Framework

arXiv:1804.00155v18 citations
Originality Incremental advance
AI Analysis

This work addresses speaker verification challenges in emotional speech for applications like security or voice assistants, but it is incremental as it builds on existing methods by adding cascaded stages.

The paper tackles degraded speaker verification in emotional talking environments by proposing a three-stage framework that sequentially uses gender and emotion identification before verification, showing that combining both cues outperforms using either alone or none, with performance close to human subjective assessment.

This work is dedicated to introducing, executing, and assessing a three-stage speaker verification framework to enhance the degraded speaker verification performance in emotional talking environments. Our framework is comprised of three cascaded stages: gender identification stage followed by an emotion identification stage followed by a speaker verification stage. The proposed framework has been assessed on two distinct and independent emotional speech datasets: our collected dataset and Emotional Prosody Speech and Transcripts dataset. Our results demonstrate that speaker verification based on both gender cues and emotion cues is superior to each of speaker verification based on gender cues only, emotion cues only, and neither gender cues nor emotion cues. The achieved average speaker verification performance based on the suggested methodology is very similar to that attained in subjective assessment by human listeners.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes