UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions
This provides a consistent resource for computational and linguistic research on child language, but it is incremental as it harmonizes existing annotations rather than introducing new methods.
The paper tackled the lack of a standardized dependency treebank for child language interactions by creating UD-English-CHILDES, the first officially released Universal Dependencies treebank derived from CHILDES data, resulting in over 48K gold-standard sentences and 1M silver-standard sentences.
CHILDES is a widely used resource of transcribed child and child-directed speech. This paper introduces UD-English-CHILDES, the first officially released Universal Dependencies (UD) treebank. It is derived from previously dependency-annotated CHILDES data, which we harmonize to follow unified annotation principles. The gold-standard trees encompass utterances sampled from 11 children and their caregivers, totaling over 48K sentences (236K tokens). We validate these gold-standard annotations under the UD v2 framework and provide an additional 1M~silver-standard sentences, offering a consistent resource for computational and linguistic research.