CLMar 1, 2024

PoTeC: A German Naturalistic Eye-tracking-while-reading Corpus

arXiv:2403.00506v116 citationsh-index: 8Has CodeBehavior Research Methods
Originality Synthesis-oriented
AI Analysis

This provides a new dataset for studying expert and non-expert reading strategies in German, but is incremental as it extends existing eye-tracking corpus methods to a specific language and domain.

The researchers created PoTeC, a German naturalistic eye-tracking-while-reading corpus with data from 75 participants reading 12 scientific texts, featuring a 2x2x2 factorial design to compare domain-experts and novices. They made the corpus and preprocessing code publicly available on GitHub.

The Potsdam Textbook Corpus (PoTeC) is a naturalistic eye-tracking-while-reading corpus containing data from 75 participants reading 12 scientific texts. PoTeC is the first naturalistic eye-tracking-while-reading corpus that contains eye-movements from domain-experts as well as novices in a within-participant manipulation: It is based on a 2x2x2 fully-crossed factorial design which includes the participants' level of study and the participants' discipline of study as between-subject factors and the text domain as a within-subject factor. The participants' reading comprehension was assessed by a series of text comprehension questions and their domain knowledge was tested by text-independent background questions for each of the texts. The materials are annotated for a variety of linguistic features at different levels. We envision PoTeC to be used for a wide range of studies including but not limited to analyses of expert and non-expert reading strategies. The corpus and all the accompanying data at all stages of the preprocessing pipeline and all code used to preprocess the data are made available via GitHub: https://github.com/DiLi-Lab/PoTeC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes