Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis
This work addresses the challenge of modeling radiologists' intent-driven gaze patterns for medical image diagnosis, which is an incremental improvement over existing methods that fail to capture such underlying intentions.
The paper tackled the problem of interpreting radiologists' diagnostic intentions from eye movements during chest X-ray analysis by introducing RadGazeIntent, a deep learning model that predicts which findings radiologists are examining at specific moments, outperforming baseline methods across intention-labeled datasets.
Radiologists rely on eye movements to navigate and interpret medical images. A trained radiologist possesses knowledge about the potential diseases that may be present in the images and, when searching, follows a mental checklist to locate them using their gaze. This is a key observation, yet existing models fail to capture the underlying intent behind each fixation. In this paper, we introduce a deep learning-based approach, RadGazeIntent, designed to model this behavior: having an intention to find something and actively searching for it. Our transformer-based architecture processes both the temporal and spatial dimensions of gaze data, transforming fine-grained fixation features into coarse, meaningful representations of diagnostic intent to interpret radiologists' goals. To capture the nuances of radiologists' varied intention-driven behaviors, we process existing medical eye-tracking datasets to create three intention-labeled subsets: RadSeq (Systematic Sequential Search), RadExplore (Uncertainty-driven Exploration), and RadHybrid (Hybrid Pattern). Experimental results demonstrate RadGazeIntent's ability to predict which findings radiologists are examining at specific moments, outperforming baseline methods across all intention-labeled datasets.