VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling
This provides a valuable dataset for researchers in speech processing and computational linguistics to study aging effects, though it is incremental as it focuses on a specific language and domain.
The authors tackled the challenge of modeling aging voices in speech processing by introducing VoxKnesset, a large-scale longitudinal Hebrew speech dataset spanning 2009-2025 with ~2,300 hours from 393 speakers, and found that speaker verification error rates increased from 2.15% to 4.58% over 15 years, while cross-sectional age prediction models failed but longitudinal ones captured meaningful aging signals.
Speech processing systems face a fundamental challenge: the human voice changes with age, yet few datasets support rigorous longitudinal evaluation. We introduce VoxKnesset, an open-access dataset of ~2,300 hours of Hebrew parliamentary speech spanning 2009-2025, comprising 393 speakers with recording spans of up to 15 years. Each segment includes aligned transcripts and verified demographic metadata from official parliamentary records. We benchmark modern speech embeddings (WavLM-Large, ECAPA-TDNN, Wav2Vec2-XLSR-1B) on age prediction and speaker verification under longitudinal conditions. Speaker verification EER rises from 2.15\% to 4.58\% over 15 years for the strongest model, and cross-sectionally trained age regressors fail to capture within-speaker aging, while longitudinally trained models recover a meaningful temporal signal. We publicly release the dataset and pipeline to support aging-robust speech systems and Hebrew speech processing.