SDOct 25, 2014

On the use of Stress information in Speech for Speaker Recognition

arXiv:1410.6905v14 citations
Originality Incremental advance
AI Analysis

This addresses speaker recognition robustness for applications where speakers may be under stress, but it is incremental as it builds on existing feature-based methods.

The paper tackles the problem of speaker recognition performance degradation under stress by proposing to use inherent stress-in-speech information as additional cues, resulting in the identification of PAD (pitch, amplitude, duration) features that are unique to a speaker's speaking style.

The performance of a speaker recognition system decreases when the speaker is under stress or emotion. In this paper we explore and identify a mechanism that enables use of inherent stress-in-speech or speaking style information present in speech of a person as additional cues for speaker recognition. We quantify the the inherent stress present in the speech of a speaker mainly using 3 features, namely, pitch, amplitude and duration (together called PAD) We experimentally observe that the PAD vectors of similar phones in different words of a speaker are close to each other in the three dimensional (PAD) space confirming that the way a speaker stresses different syllables in their speech is unique to them, thus we propose the use of PAD based speaking style of a speaker as an additional feature for speaker recognition applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes