SDLGASMLFeb 23, 2022

Towards Speaker Age Estimation with Label Distribution Learning

arXiv:2202.11424v127 citations
Originality Incremental advance
AI Analysis

This work improves speaker age estimation for applications like biometrics or speech analysis, but it is incremental as it builds on existing label distribution learning methods.

The paper tackles speaker age estimation by addressing label ambiguity through label distribution learning, converting age labels into distributions and combining classification and regression approaches, resulting in a 10% reduction in mean absolute error on a real-world dataset.

Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. However, precise age identification remains a challenge due to label ambiguity, \emph{i.e.}, utterances from adjacent age of the same person are often indistinguishable. To address this, we utilize the ambiguous information among the age labels, convert each age label into a discrete label distribution and leverage the label distribution learning (LDL) method to fit the data. For each audio data sample, our method produces a age distribution of its speaker, and on top of the distribution we also perform two other tasks: age prediction and age uncertainty minimization. Therefore, our method naturally combines the age classification and regression approaches, which enhances the robustness of our method. We conduct experiments on the public NIST SRE08-10 dataset and a real-world dataset, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding a 10\% reduction in terms of mean absolute error (MAE) on a real-world dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes