Trust Through Transparency: Explainable Social Navigation for Autonomous Mobile Robots via Vision-Language Models
For developers and users of service robots, this work addresses the need for transparent human-robot interaction, though the improvement is incremental over existing explainability methods.
This paper presents a multimodal explainability module integrating vision-language models and heat maps to enable autonomous robots to articulate navigation decisions via natural language. User studies (n=30) showed majority preference for real-time explanations, indicating improved trust.
Service and assistive robots are increasingly being deployed in dynamic social environments; however, ensuring transparent and explainable interactions remains a significant challenge. This paper presents a multimodal explainability module that integrates vision language models and heat maps to improve transparency during navigation. The proposed system enables robots to perceive, analyze, and articulate their observations through natural language summaries. User studies (n=30) showed a preference of majority for real-time explanations, indicating improved trust and understanding. Our experiments were validated through confusion matrix analysis to assess the level of agreement with human expectations. Our experimental and simulation results emphasize the effectiveness of explainability in autonomous navigation, enhancing trust and interpretability.