On the use of Stress information in Speech for Speaker Recognition
This addresses speaker recognition robustness for applications where speakers may be under stress, but it is incremental as it builds on existing feature-based methods.
The paper tackles the problem of speaker recognition performance degradation under stress by proposing to use inherent stress-in-speech information as additional cues, resulting in the identification of PAD (pitch, amplitude, duration) features that are unique to a speaker's speaking style.
The performance of a speaker recognition system decreases when the speaker is under stress or emotion. In this paper we explore and identify a mechanism that enables use of inherent stress-in-speech or speaking style information present in speech of a person as additional cues for speaker recognition. We quantify the the inherent stress present in the speech of a speaker mainly using 3 features, namely, pitch, amplitude and duration (together called PAD) We experimentally observe that the PAD vectors of similar phones in different words of a speaker are close to each other in the three dimensional (PAD) space confirming that the way a speaker stresses different syllables in their speech is unique to them, thus we propose the use of PAD based speaking style of a speaker as an additional feature for speaker recognition applications.