CVMar 16

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Jiangyang Li, Cong Wan, SongLin Dong, Chenhao Ding, Qiang Wang, Zhiheng Ma, Yihong Gong

arXiv:2603.1537024.53 citationsh-index: 16

Predicted impact top 26% in CV · last 90 daysOriginality Highly original

AI Analysis

This addresses robustness issues in VLN for agents navigating photo-realistic environments, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of limited generalization and poor robustness in Vision-and-Language Navigation (VLN) by proposing NavGRPO, a reinforcement learning framework that achieves +3.0% and +1.71% SPL improvements on unseen environments and +14.89% SPL gain under extreme perturbations.

Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which suffers from limited generalization and poor robustness to execution perturbations. We present NavGRPO, a reinforcement learning framework that learns goal-directed navigation policies through Group Relative Policy Optimization. By exploring diverse trajectories and optimizing via within-group performance comparisons, our method enables agents to distinguish effective strategies beyond expert paths without requiring additional value networks. Built on ScaleVLN, NavGRPO achieves superior robustness on R2R and REVERIE benchmarks with +3.0% and +1.71% SPL improvements in unseen environments. Under extreme early-stage perturbations, we demonstrate +14.89% SPL gain over the baseline, confirming that goal-directed RL training builds substantially more robust navigation policies. Code and models will be released.

View on arXiv PDF

Similar