From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection
This addresses the challenge of enabling robots to navigate in human environments without violating social norms, which is incremental as it builds on existing methods by adding social reasoning.
The paper tackles the problem of social robot navigation by integrating geometric planning with contextual social reasoning, using a fine-tuned vision-language model to select socially optimized paths, resulting in the best overall performance with the lowest personal space violation duration, minimal pedestrian-facing time, and no social zone intrusions in experiments across four contexts.
Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning. The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths, informed by contextually grounded social expectations, selecting a socially optimized path for the controller. This task-specific VLM distills social reasoning from large foundation models into a smaller and efficient model, allowing the framework to perform real-time adaptation in diverse human-robot interaction contexts. Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions. Project page: https://path-etiquette.github.io