SDAICVMMMay 27, 2025

VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin

arXiv:2505.21445v12 citationsh-index: 5INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of limited longitudinal data for speaker aging research, benefiting speech technology developers, but is incremental as it focuses on dataset creation and analysis.

The authors tackled the problem of speaker aging's adverse effects on verification systems by introducing VoxAging, a large-scale longitudinal dataset with 293 speakers tracked over up to 17 years, and analyzed aging impacts on systems and factors like age and gender.

The performance of speaker verification systems is adversely affected by speaker aging. However, due to challenges in data collection, particularly the lack of sustained and large-scale longitudinal data for individuals, research on speaker aging remains difficult. In this paper, we present VoxAging, a large-scale longitudinal dataset collected from 293 speakers (226 English speakers and 67 Mandarin speakers) over several years, with the longest time span reaching 17 years (approximately 900 weeks). For each speaker, the data were recorded at weekly intervals. We studied the phenomenon of speaker aging and its effects on advanced speaker verification systems, analyzed individual speaker aging processes, and explored the impact of factors such as age group and gender on speaker aging research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes