SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality
This work addresses the lack of comparable systems for LLM-based AR recommendation in egocentric driving, providing an incremental improvement for vehicle AR applications.
The paper tackles the problem of dynamic egocentric vehicle-based augmented reality (AR) by introducing SEER-VAR, a framework that unifies semantic decomposition, context-aware SLAM branches, and LLM-driven recommendations, resulting in robust spatial alignment and perceptually coherent AR rendering across varied environments, with user studies showing enhanced scene understanding, overlay relevance, and driver ease.
We present SEER-VAR, a novel framework for egocentric vehicle-based augmented reality (AR) that unifies semantic decomposition, Context-Aware SLAM Branches (CASB), and LLM-driven recommendation. Unlike existing systems that assume static or single-view settings, SEER-VAR dynamically separates cabin and road scenes via depth-guided vision-language grounding. Two SLAM branches track egocentric motion in each context, while a GPT-based module generates context-aware overlays such as dashboard cues and hazard alerts. To support evaluation, we introduce EgoSLAM-Drive, a real-world dataset featuring synchronized egocentric views, 6DoF ground-truth poses, and AR annotations across diverse driving scenarios. Experiments demonstrate that SEER-VAR achieves robust spatial alignment and perceptually coherent AR rendering across varied environments. As one of the first to explore LLM-based AR recommendation in egocentric driving, we address the lack of comparable systems through structured prompting and detailed user studies. Results show that SEER-VAR enhances perceived scene understanding, overlay relevance, and driver ease, providing an effective foundation for future research in this direction. Code and dataset will be made open source.