CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild
This work addresses the need for efficient, customizable eye-tracking models to benefit human-computer interaction applications, though it is incremental as it builds on existing methods for eye analysis.
The authors tackled the problem of non-intrusive, real-time analysis of eye dynamics for monitoring visual attention and mental state in human-computer interaction by proposing CLERA, a unified model for joint cognitive load and eye region analysis, which outperforms prior work on tasks like cognitive load estimation and eye landmark detection.
Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans' visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequently employed, the difficulty of customizing these devices places unnecessary constraints on the exploration of more efficient, end-to-end models of eye dynamics. In this work, we propose CLERA, a unified model for Cognitive Load and Eye Region Analysis, which achieves precise keypoint detection and spatiotemporal tracking in a joint-learning framework. Our method demonstrates significant efficiency and outperforms prior work on tasks including cognitive load estimation, eye landmark detection, and blink estimation. We also introduce a large-scale dataset of 30k human faces with joint pupil, eye-openness, and landmark annotation, which aims to support future HCI research on human factors and eye-related analysis.