AS LG SDJun 4, 2025

Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity

Loann Peurey, Marvin Lavechin, Tarek Kunze, Manel Khentout, Lucas Gautheron, Emmanuel Dupoux, Alejandrina Cristia

arXiv:2506.11075v13.33 citationsh-index: 12INTERSPEECH

Originality Synthesis-oriented

AI Analysis

This addresses data quality issues in automated analysis of child language recordings for researchers and clinicians, but it is incremental as it builds on existing knowledge without introducing new methods.

The paper reviews the use of long-form audio recordings from child-worn devices in child language research, highlighting their potential for high validity but noting challenges in automated analysis due to errors and data volume. It proposes troubleshooting metrics and practical strategies to improve data quality and interpretation.

Audio-recordings collected with a child-worn device are a fundamental tool in child language research. Long-form recordings collected over whole days promise to capture children's input and production with minimal observer bias, and therefore high validity. The sheer volume of resulting data necessitates automated analysis to extract relevant metrics for researchers and clinicians. This paper summarizes collective knowledge on this technique, providing entry points to existing resources. We also highlight various sources of error that threaten the accuracy of automated annotations and the interpretation of resulting metrics. To address this, we propose potential troubleshooting metrics to help users assess data quality. While a fully automated quality control system is not feasible, we outline practical strategies for researchers to improve data collection and contextualize their analyses.

View on arXiv PDF

Similar