A. K. Sarkar

CLNov 19, 2016

Incorporating Pass-Phrase Dependent Background Models for Text-Dependent Speaker Verification

A. K. Sarkar, Zheng-Hua Tan

In this paper, we propose pass-phrase dependent background models (PBMs) for text-dependent (TD) speaker verification (SV) to integrate the pass-phrase identification process into the conventional TD-SV system, where a PBM is derived from a text-independent background model through adaptation using the utterances of a particular pass-phrase. During training, pass-phrase specific target speaker models are derived from the particular PBM using the training data for the respective target model. While testing, the best PBM is first selected for the test utterance in the maximum likelihood (ML) sense and the selected PBM is then used for the log likelihood ratio (LLR) calculation with respect to the claimant model. The proposed method incorporates the pass-phrase identification step in the LLR calculation, which is not considered in conventional standalone TD-SV systems. The performance of the proposed method is compared to conventional text-independent background model based TD-SV systems using either Gaussian mixture model (GMM)-universal background model (UBM) or Hidden Markov model (HMM)-UBM or i-vector paradigms. In addition, we consider two approaches to build PBMs: speaker-independent and speaker-dependent. We show that the proposed method significantly reduces the error rates of text-dependent speaker verification for the non-target types: target-wrong and imposter-wrong while it maintains comparable TD-SV performance when imposters speak a correct utterance with respect to the conventional system. Experiments are conducted on the RedDots challenge and the RSR2015 databases that consist of short utterances.

SDMay 12, 2016

Sub-vector Extraction and Cascade Post-Processing for Speaker Verification Using MLLR Super-vectors

A. K. Sarkar, C. Barras, V. B. Le et al.

In this paper, we propose a speaker-verification system based on maximum likelihood linear regression (MLLR) super-vectors, for which speakers are characterized by m-vectors. These vectors are obtained by a uniform segmentation of the speaker MLLR super-vector using an overlapped sliding window. We consider three approaches for MLLR transformation, based on the conventional $1$-best automatic transcription, on the lattice word transcription, or on a simple global universal background model (UBM). Session variability compensation is performed in a post-processing module with probabilistic linear discriminant analysis (PLDA) or the eigen factor radial (EFR). Alternatively, we propose a cascade post-processing for the MLLR super-vector based speaker-verification system. In this case, the m-vectors or MLLR super-vectors are first projected onto a lower-dimensional vector space generated by linear discriminant analysis (LDA). Next, PLDA session variability compensation and scoring is applied to the reduced-dimensional vectors. This approach combines the advantages of both techniques and makes the estimation of PLDA parameters easier. Experimental results on telephone conversations of the NIST 2008 and 2010 speaker recognition evaluation (SRE) indicate that the proposed m-vector system performs significantly better than the conventional system based on the full MLLR super-vectors. Cascade post-processing further reduces the error rate in all cases. Finally, we present the results of fusion with a standard i-vector system in the feature, as well as in the score domain, demonstrating that the m-vector system is both competitive and complementary with it.

A. K. Sarkar

2 Papers