ASSDOct 8, 2020

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

arXiv:2010.03909v1
Originality Incremental advance
AI Analysis

This addresses the challenge of speaker identification for systems that must handle emotional variations in speech, though it is incremental as it builds on existing i-vector methods.

The paper tackled the problem of speaker identification with emotional speech by creating emotion invariant speaker embeddings, achieving an absolute improvement of 2.6% in accuracy compared to an average speaker model framework.

Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We learn an extractor network that maps the test embeddings with different emotions obtained using i-vector based system to an emotion invariant space. The resultant test embeddings thus become emotion invariant and thereby compensate the mismatch between various emotional states. The studies are conducted using four different emotion classes from IEMOCAP database. We obtain an absolute improvement of 2.6% in accuracy for speaker identification studies using emotion invariant speaker embedding against average speaker model based framework with different emotions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes