VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin
This addresses the challenge of limited longitudinal data for speaker aging research, benefiting speech technology developers, but is incremental as it focuses on dataset creation and analysis.
The authors tackled the problem of speaker aging's adverse effects on verification systems by introducing VoxAging, a large-scale longitudinal dataset with 293 speakers tracked over up to 17 years, and analyzed aging impacts on systems and factors like age and gender.
The performance of speaker verification systems is adversely affected by speaker aging. However, due to challenges in data collection, particularly the lack of sustained and large-scale longitudinal data for individuals, research on speaker aging remains difficult. In this paper, we present VoxAging, a large-scale longitudinal dataset collected from 293 speakers (226 English speakers and 67 Mandarin speakers) over several years, with the longest time span reaching 17 years (approximately 900 weeks). For each speaker, the data were recorded at weekly intervals. We studied the phenomenon of speaker aging and its effects on advanced speaker verification systems, analyzed individual speaker aging processes, and explored the impact of factors such as age group and gender on speaker aging research.