SDAug 3, 2017

Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition

arXiv:1708.01232v25 citations
AI Analysis

This addresses the problem of language mismatch in speaker recognition for applications like multilingual systems, but it is incremental as it builds on existing whitening techniques.

The paper tackles performance degradation in speaker recognition due to language mismatches by proposing a recursive whitening transformation method, which is validated on non-English speaker recognition tasks using a state-of-the-art system and shows effectiveness compared to prior studies.

Recently in speaker recognition, performance degradation due to the channel domain mismatched condition has been actively addressed. However, the mismatches arising from language is yet to be sufficiently addressed. This paper proposes an approach which employs recursive whitening transformation to mitigate the language mismatched condition. The proposed method is based on the multiple whitening transformation, which is intended to remove un-whitened residual components in the dataset associated with i-vector length normalization. The experiments were conducted on the Speaker Recognition Evaluation 2016 trials of which the task is non-English speaker recognition using development dataset consist of both a large scale out-of-domain (English) dataset and an extremely low-quantity in-domain (non-English) dataset. For performance comparison, we develop a state-of- the-art system using deep neural network and bottleneck feature, which is based on a phonetically aware model. From the experimental results, along with other prior studies, effectiveness of the proposed method on language mismatched condition is validated.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes