Context-aware child-directed speech detection from long-form recordings

Théo Charlot, Tarek Kunze, Kaveri K. Sheth, Alejandrina Cristia, Marvin Lavechin

arXiv:2606.0113448.1

AI Analysis

This work improves automated analysis of children's language environments for developmental researchers, but the gains are incremental over existing methods.

The authors fine-tuned self-supervised models on multilingual child-centered recordings and incorporated surrounding context to improve child-directed speech detection, achieving a 13.8% absolute F1-score gain over isolated utterance processing.

Automatically distinguishing child-directed speech from adult-directed speech in long-form recordings is key to scalable analyses of children's language environments. Existing approaches process utterances in isolation and have been evaluated primarily on English. We address these gaps along three dimensions. First, we fine-tune and evaluate six-self supervised models on a multilingual dataset of 182 children, showing that in-domain pre-training on child-centered recordings substantially outperforms models trained on adult speech. Second, we demonstrate that incorporating surrounding context substantially improves classification, with an absolute gain of 13.8% in average F1-score. Third, we evaluate our model in a realistic end-to-end pipeline, from adult speech detection to addressee classification, showing that performance drops under automatic segmentation but still consistently outperforms a rule-based baseline.

View on arXiv PDF

Similar