SDASApr 13, 2018

Speaker Embedding Extraction with Phonetic Information

arXiv:1804.04862v275 citations
Originality Incremental advance
AI Analysis

This work addresses speaker verification for speech processing applications, presenting an incremental improvement over existing methods.

The paper tackled the problem of speaker verification by incorporating phonetic information into speaker embedding extraction, resulting in a 20% improvement in EER and 15% gains in minDCF metrics on the Fisher dataset.

Speaker embeddings achieve promising results on many speaker verification tasks. Phonetic information, as an important component of speech, is rarely considered in the extraction of speaker embeddings. In this paper, we introduce phonetic information to the speaker embedding extraction based on the x-vector architecture. Two methods using phonetic vectors and multi-task learning are proposed. On the Fisher dataset, our best system outperforms the original x-vector approach by 20% in EER, and by 15%, 15% in minDCF08 and minDCF10, respectively. Experiments conducted on NIST SRE10 further demonstrate the effectiveness of the proposed methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes