CV AIJan 25, 2025

Towards Robust Unsupervised Attention Prediction in Autonomous Driving

Mengshi Qi, Xiaoyang Bi, Pengfei Zhu, Huadong Ma

arXiv:2501.15045v23.6h-index: 8Has Code

Originality Incremental advance

AI Analysis

This work addresses the critical need for safe and efficient attention prediction in autonomous driving, offering an unsupervised solution that reduces labeling costs and enhances robustness, though it is incremental in building on existing methods.

The paper tackles the problem of robustly predicting attention regions for self-driving systems without supervision, addressing challenges like domain gaps and corruption, and achieves performance matching or surpassing supervised state-of-the-art methods, with reductions in corruption degradation by up to 58.8% and improvements in central bias robustness by up to 12.4%.

Robustly predicting attention regions of interest for self-driving systems is crucial for driving safety but presents significant challenges due to the labor-intensive nature of obtaining large-scale attention labels and the domain gap between self-driving scenarios and natural scenes. These challenges are further exacerbated by complex traffic environments, including camera corruption under adverse weather, noise interferences, and central bias from long-tail distributions. To address these issues, we propose a robust unsupervised attention prediction method. An Uncertainty Mining Branch refines predictions by analyzing commonalities and differences across multiple pre-trained models on natural scenes, while a Knowledge Embedding Block bridges the domain gap by incorporating driving knowledge to adaptively enhance pseudo-labels. Additionally, we introduce RoboMixup, a novel data augmentation method that improves robustness against corruption through soft attention and dynamic augmentation, and mitigates central bias by integrating random cropping into Mixup as a regularizer. To systematically evaluate robustness in self-driving attention prediction, we introduce the DriverAttention-C benchmark, comprising over 100k frames across three subsets: BDD-A-C, DR(eye)VE-C, and DADA-2000-C. Our method achieves performance equivalent to or surpassing fully supervised state-of-the-art approaches on three public datasets and the proposed robustness benchmark, reducing relative corruption degradation by 58.8% and 52.8%, and improving central bias robustness by 12.4% and 11.4% in KLD and CC metrics, respectively. Code and data are available at https://github.com/zaplm/DriverAttention.

View on arXiv PDF Code

Similar