CLSep 28, 2022

Data-driven Parsing Evaluation for Child-Parent Interactions

arXiv:2209.13778v148 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of parsing naturalistic child speech for researchers in computational linguistics and child language development, but it is incremental as it builds on existing datasets and methods.

The authors created a large syntactic dependency treebank for child-parent speech interactions and used it to evaluate how well state-of-the-art dependency parsers, trained on written texts, perform on this spoken data and how parser performance relates to child developmental stages.

We present a syntactic dependency treebank for naturalistic child and child-directed speech in English (MacWhinney, 2000). Our annotations largely followed the guidelines of the Universal Dependencies project (UD (Zeman et al., 2022)), with detailed extensions to lexical/syntactic structures unique to conversational speech (in opposition to written texts). Compared to existing UD-style spoken treebanks as well as other dependency corpora of child-parent interactions specifically, our dataset is of (much) larger size (N of utterances = 44,744; N of words = 233, 907) and contains speech from a total of 10 children covering a wide age range (18-66 months). With this dataset, we ask: (1) How well would state-of-the-art dependency parsers, tailored for the written domain, perform for speech of different interlocutors in spontaneous conversations? (2) What is the relationship between parser performance and the developmental stage of the child? To address these questions, in ongoing work, we are conducting thorough dependency parser evaluations using both graph-based and transition-based parsers with different hyperparameterization, trained from three different types of out-of-domain written texts: news, tweets, and learner data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes