CLAIFeb 17, 2025

Demographic Attributes Prediction from Speech Using WavLM Embeddings

arXiv:2502.12007v14 citationsh-index: 47CISS
Originality Incremental advance
AI Analysis

This work addresses demographic feature prediction for applications in language learning, accessibility, and digital forensics, but it is incremental as it builds on existing pretrained models with improvements in accuracy and error metrics.

This paper tackled the problem of predicting demographic attributes like age, gender, native language, education, and country from speech using WavLM embeddings, achieving a Mean Absolute Error of 4.94 for age prediction and over 99.81% accuracy for gender classification.

This paper introduces a general classifier based on WavLM features, to infer demographic characteristics, such as age, gender, native language, education, and country, from speech. Demographic feature prediction plays a crucial role in applications like language learning, accessibility, and digital forensics, enabling more personalized and inclusive technologies. Leveraging pretrained models for embedding extraction, the proposed framework identifies key acoustic and linguistic fea-tures associated with demographic attributes, achieving a Mean Absolute Error (MAE) of 4.94 for age prediction and over 99.81% accuracy for gender classification across various datasets. Our system improves upon existing models by up to relative 30% in MAE and up to relative 10% in accuracy and F1 scores across tasks, leveraging a diverse range of datasets and large pretrained models to ensure robustness and generalizability. This study offers new insights into speaker diversity and provides a strong foundation for future research in speech-based demographic profiling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes