Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors
This addresses a crucial challenge in speaker verification technology for improving accuracy with short test utterances, though it is incremental as it builds on existing i-vector systems.
The paper tackles the problem of poor recognition accuracy in speaker verification with short-duration speech by proposing a new quality metric for i-vectors, showing considerable improvement in performance on the NIST SRE 2008 corpus.
Automatic speaker verification (ASV) is the process to recognize persons using voice as biometric. The ASV systems show considerable recognition performance with sufficient amount of speech from matched condition. One of the crucial challenges of ASV technology is to improve recognition performance with speech segments of short duration. In short duration condition, the model parameters are not properly estimated due to inadequate speech information, and this results poor recognition accuracy even with the state-of-the-art i-vector based ASV system. We hypothesize that considering the estimation quality during recognition process would help to improve the ASV performance. This can be incorporated as a quality measure during fusion of ASV systems. This paper investigates a new quality measure for i-vector representation of speech utterances computed directly from Baum-Welch statistics. The proposed metric is subsequently used as quality measure during fusion of ASV systems. In experiments with the NIST SRE 2008 corpus, We have shown that inclusion of proposed quality metric exhibits considerable improvement in speaker verification performance. The results also indicate the potentiality of the proposed method in real-world scenario with short test utterances.