WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading Dataset
This dataset enables low-cost, diverse data collection for explainable computational language processing, though it is incremental as it builds on existing webcam-based methods.
The authors tackled the lack of low-cost, multilingual eye-tracking data for reading by creating WebQAmGaze, a dataset with webcam recordings from 600 participants reading in four languages, showing moderate to strong correlation with commercial eye-trackers and that fixation duration predicts answer correctness.
We present WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed as the first webcam-based eye-tracking corpus of reading to support the development of explainable computational language processing models. WebQAmGaze includes webcam eye-tracking data from 600 participants of a wide age range naturally reading English, German, Spanish, and Turkish texts. Each participant performs two reading tasks composed of five texts each, a normal reading and an information-seeking task, followed by a comprehension question. We compare the collected webcam data to high-quality eye-tracking recordings. The results show a moderate to strong correlation between the eye movement measures obtained with the webcam compared to those obtained with a commercial eye-tracking device. When validating the data, we find that higher fixation duration on relevant text spans accurately indicates correctness when answering the corresponding questions. This dataset advances webcam-based reading studies and opens avenues to low-cost and diverse data collection. WebQAmGaze is beneficial to learn about the cognitive processes behind question-answering and to apply these insights to computational models of language understanding.