Improved I-vector-based Speaker Recognition for Utterances with Speaker Generated Non-speech sounds
This addresses speaker recognition robustness for conversational speech with non-speech sounds, but it is incremental as it builds on existing i-vector methods.
The paper tackled the problem of speaker recognition in conversational speech containing non-speech sounds like laughter, by analyzing how including laughter in training affects an i-vector-based system. The result showed that including laughter during training improved overall performance, particularly on speech-laugh segments.
Conversational speech not only contains several variants of neutral speech but is also prominently interlaced with several speaker generated non-speech sounds such as laughter and breath. A robust speaker recognition system should be capable of recognizing a speaker irrespective of these variations in his speech. An understanding of whether the speaker-specific information represented by these variations is similar or not helps build a good speaker recognition system. In this paper, speaker variations captured by neutral speech of a speaker is analyzed by considering speech-laugh (a variant of neutral speech) and laughter (non-speech) sounds of the speaker. We study an i-vector-based speaker recognition system trained only on neutral speech and evaluate its performance on speech-laugh and laughter. Further, we analyze the effect of including laughter sounds during training of an i-vector-basedspeaker recognition system. Our experimental results show that the inclusion of laughter sounds during training seem to provide complementary speaker-specific information which results in an overall improved performance of the speaker recognition system, especially on the utterances with speech-laugh segments.