ROJun 2

GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language Navigation

arXiv:2606.0368284.9h-index: 13
AI Analysis

For embodied navigation researchers, this work provides a comprehensive framework (data, simulation, benchmark, and learning) that improves generalization and long-horizon capabilities, though it is incremental as it combines existing techniques (3DGS, RL, DAgger) rather than introducing a fundamentally new paradigm.

The paper tackles data scarcity and limited generalization in Vision-and-Language Navigation (VLN) by introducing a unified paradigm (GN0) that includes a large-scale dataset (GN-Matrix), a high-fidelity simulation platform based on 3D Gaussian Splatting, a BEV-based benchmark (GN-Bench) with dynamic avatars, and an RL-driven navigation model (BAE). GN0 outperforms state-of-the-art VLN methods on GN-Bench and VLN-CE.

Embodied navigation connects intelligent agents with the physical world and is fundamental for general robotic intelligence. Limited availability and quality of navigation data have constrained Vision-and-Language Navigation (VLN) systems' generalization and long-horizon capabilities. To address this, we curate diverse 3D scenes and develop an automated pipeline for large-scale navigation data, resulting in the GN-Matrix dataset. Building on a 3D Gaussian Splatting (3DGS) engine, we introduce a high-fidelity simulation platform supporting interactive roaming and collision-aware navigation. We further propose GN-Bench, the first BEV-based benchmark incorporating dynamic 3DGS avatars for human-robot interaction evaluation. To leverage the simulator, we develop an RL-driven navigation foundation model, Break and Establish (BAE). After supervised learning, DAgger exposes the model to rollout-induced states, breaking narrow expert-centric distributions and enabling downstream RL exploration. This unified VLN paradigm integrates map-based and map-free tasks, including instruction following, human following, and goal navigation. GN-BAE formalizes high-fidelity 3DGS-rendered Bird's Eye View representations as compact memory, unlocking latent spatial reasoning in VLMs. Extensive evaluations on GN-Bench and VLN-CE show that GN0 outperforms state-of-the-art VLN methods. Overall, GN-Matrix offers a unified framework spanning data, simulation, and learning, advancing embodied navigation in research and industrial applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes