SDCLAug 7, 2016

Incorporation of Speech Duration Information in Score Fusion of Speaker Recognition Systems

arXiv:1608.02272v11 citations
Originality Incremental advance
AI Analysis

This work addresses variability in speech duration for speaker recognition systems, which is an incremental improvement for applications with short recordings.

The paper tackled performance degradation in speaker verification systems due to short speech durations by investigating the effect of duration on three state-of-the-art systems and using score fusion methods, resulting in a technique that performed significantly better than baseline score fusion methods.

In recent years identity-vector (i-vector) based speaker verification (SV) systems have become very successful. Nevertheless, environmental noise and speech duration variability still have a significant effect on degrading the performance of these systems. In many real-life applications, duration of recordings are very short; as a result, extracted i-vectors cannot reliably represent the attributes of the speaker. Here, we investigate the effect of speech duration on the performance of three state-of-the-art speaker recognition systems. In addition, using a variety of available score fusion methods, we investigate the effect of score fusion for those speaker verification techniques to benefit from the performance difference of different methods under different enrollment and test speech duration conditions. This technique performed significantly better than the baseline score fusion methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes