LAST at CMCL 2021 Shared Task: Predicting Gaze Data During Reading with a Gradient Boosting Decision Tree Approach
This work addresses the specific challenge of gaze prediction in reading for computational linguistics, but it is incremental as it applies an existing method to a new dataset.
The authors tackled the problem of predicting eye-tracking data during reading by optimizing a LightGBM model with lexical and psychometric features, achieving the best performance on two out of five measures and ranking first in the CMCL 2021 Shared Task, outperforming all deep-learning systems.
A LightGBM model fed with target word lexical characteristics and features obtained from word frequency lists, psychometric data and bigram association measures has been optimized for the 2021 CMCL Shared Task on Eye-Tracking Data Prediction. It obtained the best performance of all teams on two of the five eye-tracking measures to predict, allowing it to rank first on the official challenge criterion and to outperform all deep-learning based systems participating in the challenge.