SDASOct 28, 2017

Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

arXiv:1710.10436v42 citations
Originality Synthesis-oriented
AI Analysis

This work addresses security in speaker verification systems for applications like authentication, though it is incremental as it builds on existing alignment and scoring techniques.

The paper investigated different frame alignment methods for GMM-based digit-prompted speaker verification, finding that DNN-based alignments perform similarly to HMM-based ones, and introduced a KL divergence scoring method to reject incorrect pass-phrases, improving security without significant computational cost.

Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes