SD CL NE ASNov 15, 2017

Human and Machine Speaker Recognition Based on Short Trivial Events

Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, Dong Wang

arXiv:1711.05443v37.45 citations

Originality Synthesis-oriented

AI Analysis

This work addresses forensic examination needs by enabling speaker identification from disguised speech, but it is incremental as it applies an existing method to a new type of data.

The paper tackled speaker recognition using short trivial events like coughs and laughs, which are often ignored due to their unclear nature, and reported acceptable equal error rates (EERs) for events as short as 0.2-0.5 seconds using a deep feature learning technique.

Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, which leads to acceptable equal error rates (EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.

View on arXiv PDF

Similar