CLAIJun 10, 2025

Employing self-supervised learning models for cross-linguistic child speech maturity classification

arXiv:2506.08999v13 citationsh-index: 23Has CodeINTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the challenge of improving speech technology for child speech, which is hindered by small datasets, by providing a robust cross-linguistic solution.

The study tackled the problem of classifying child vocalizations by applying state-of-the-art transformer models to a novel, large-scale dataset called SpeechMaturity, achieving classification accuracy comparable to humans and outperforming previous models.

Speech technology systems struggle with many downstream tasks for child speech due to small training corpora and the difficulties that child speech pose. We apply a novel dataset, SpeechMaturity, to state-of-the-art transformer models to address a fundamental classification task: identifying child vocalizations. Unlike previous corpora, our dataset captures maximally ecologically-valid child vocalizations across an unprecedented sample, comprising children acquiring 25+ languages in the U.S., Bolivia, Vanuatu, Papua New Guinea, Solomon Islands, and France. The dataset contains 242,004 labeled vocalizations, magnitudes larger than previous work. Models were trained to distinguish between cry, laughter, mature (consonant+vowel), and immature speech (just consonant or vowel). Models trained on the dataset outperform state-of-the-art models trained on previous datasets, achieved classification accuracy comparable to humans, and were robust across rural and urban settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes