SDCLJun 22, 2017

Cross-lingual Speaker Verification with Deep Feature Learning

arXiv:1706.07861v116 citations
Originality Incremental advance
AI Analysis

This addresses the problem of language mismatch in speaker verification systems, offering a robust solution for cross-lingual applications, though it appears incremental as it builds on prior deep learning methods.

The paper tackled performance degradation in speaker verification due to language mismatch by proposing a deep learning model for feature extraction, which outperformed the i-vector system with a large margin in cross-lingual scenarios, such as English training with Chinese or Uyghur enrollment and test.

Existing speaker verification (SV) systems often suffer from performance degradation if there is any language mismatch between model training, speaker enrollment, and test. A major cause of this degradation is that most existing SV methods rely on a probabilistic model to infer the speaker factor, so any significant change on the distribution of the speech signal will impact the inference. Recently, we proposed a deep learning model that can learn how to extract the speaker factor by a deep neural network (DNN). By this feature learning, an SV system can be constructed with a very simple back-end model. In this paper, we investigate the robustness of the feature-based SV system in situations with language mismatch. Our experiments were conducted on a complex cross-lingual scenario, where the model training was in English, and the enrollment and test were in Chinese or Uyghur. The experiments demonstrated that the feature-based system outperformed the i-vector system with a large margin, particularly with language mismatch between enrollment and test.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes