SD AI ASSep 24, 2024

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification

Fengrun Zhang, Wangjin Zhou, Yiming Liu, Wang Geng, Yahui Shan, Chen Zhang

arXiv:2409.15974v12.7h-index: 3

Originality Incremental advance

AI Analysis

This addresses the challenge of poor performance in speaker verification systems due to aging effects, which is a domain-specific problem for speech processing.

The paper tackles the problem of cross-age speaker verification by proposing a disentangled representation learning framework that minimizes mutual information between age- and identity-related embeddings, resulting in age-invariant speaker embeddings; it outperforms other methods on multiple Cross-Age test sets of Vox-CA.

There has been an increasing research interest in cross-age speaker verification~(CASV). However, existing speaker verification systems perform poorly in CASV due to the great individual differences in voice caused by aging. In this paper, we propose a disentangled representation learning framework for CASV based on mutual information~(MI) minimization. In our method, a backbone model is trained to disentangle the identity- and age-related embeddings from speaker information, and an MI estimator is trained to minimize the correlation between age- and identity-related embeddings via MI minimization, resulting in age-invariant speaker embeddings. Furthermore, by using the age gaps between positive and negative samples, we propose an aging-aware MI minimization loss function that allows the backbone model to focus more on the vocal changes with large age gaps. Experimental results show that the proposed method outperforms other methods on multiple Cross-Age test sets of Vox-CA.

View on arXiv PDF

Similar