CLFeb 29, 2024

Evaluating Webcam-based Gaze Data as an Alternative for Human Rationale Annotations

Stephanie Brandl, Oliver Eberle, Tiago Ribeiro, Anders Søgaard, Nora Hollenstein

arXiv:2402.19133v124.183 citationsh-index: 10Has CodeLREC

Originality Incremental advance

AI Analysis

This addresses the problem of time-consuming and biased manual annotations for NLP researchers, offering a more efficient evaluation approach, though it is incremental as it builds on existing eye-tracking and explainability methods.

The paper investigated whether webcam-based eye-tracking data can serve as an alternative to manual rationale annotations for evaluating explainability methods in NLP, finding that gaze data provides valuable linguistic insights and yields comparable rankings of methods to human rationales.

Rationales in the form of manually annotated input spans usually serve as ground truth when evaluating explainability methods in NLP. They are, however, time-consuming and often biased by the annotation process. In this paper, we debate whether human gaze, in the form of webcam-based eye-tracking recordings, poses a valid alternative when evaluating importance scores. We evaluate the additional information provided by gaze data, such as total reading times, gaze entropy, and decoding accuracy with respect to human rationale annotations. We compare WebQAmGaze, a multilingual dataset for information-seeking QA, with attention and explainability-based importance scores for 4 different multilingual Transformer-based language models (mBERT, distil-mBERT, XLMR, and XLMR-L) and 3 languages (English, Spanish, and German). Our pipeline can easily be applied to other tasks and languages. Our findings suggest that gaze data offers valuable linguistic insights that could be leveraged to infer task difficulty and further show a comparable ranking of explainability methods to that of human rationales.

View on arXiv PDF Code

Similar