SDMMASNov 17, 2017

A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification

arXiv:1711.06434v14 citations
Originality Incremental advance
AI Analysis

This work addresses speaker verification for security applications, but it is incremental as it builds on existing j-vector and joint Bayesian methods.

The paper tackled the problem of text-dependent speaker verification with short-duration speech by generalizing the joint Bayesian approach to model multi-faceted information in j-vectors, achieving 0.02% EER for impostor wrong and impostor correct cases on the RSR2015 dataset.

J-vector has been proved to be very effective in text-dependent speaker verification with short-duration speech. However, the current state-of-the-art back-end classifiers, e.g. joint Bayesian model, cannot make full use of such deep features. In this paper, we generalize the standard joint Bayesian approach to model the multi-faceted information in the j-vector explicitly and jointly. In our generalization, the j-vector was modeled as a result derived by a generative Double Joint Bayesian (DoJoBa) model, which contains several kinds of latent variables. With DoJoBa, we are able to explicitly build a model that can combine multiple heterogeneous information from the j-vectors. In verification step, we calculated the likelihood to describe whether the two j-vectors having consistent labels or not. On the public RSR2015 data corpus, the experimental results showed that our approach can achieve 0.02\% EER and 0.02\% EER for impostor wrong and impostor correct cases respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes