Where and What: Driver Attention-based Object Detection
This work addresses a gap in autonomous driving technology by combining pixel-level and object-level attention prediction, though it is incremental as it builds on existing driver attention datasets and methods.
The paper tackles the problem of predicting both where and what objects drivers focus on by integrating an attention prediction module into a pretrained object detection framework, achieving competitive state-of-the-art performance on pixel-level and object-level attention prediction with significantly reduced computational cost (75.3 GFLOPs less).
Human drivers use their attentional mechanisms to focus on critical objects and make decisions while driving. As human attention can be revealed from gaze data, capturing and analyzing gaze information has emerged in recent years to benefit autonomous driving technology. Previous works in this context have primarily aimed at predicting "where" human drivers look at and lack knowledge of "what" objects drivers focus on. Our work bridges the gap between pixel-level and object-level attention prediction. Specifically, we propose to integrate an attention prediction module into a pretrained object detection framework and predict the attention in a grid-based style. Furthermore, critical objects are recognized based on predicted attended-to areas. We evaluate our proposed method on two driver attention datasets, BDD-A and DR(eye)VE. Our framework achieves competitive state-of-the-art performance in the attention prediction on both pixel-level and object-level but is far more efficient (75.3 GFLOPs less) in computation.