Yunzi Wu

RO
h-index1
3papers
1citation
Novelty38%
AI Score37

3 Papers

84.9ROJun 2
GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language Navigation

Xinhai Li, Xiaotao Zhang, Yuehao Huang et al.

Embodied navigation connects intelligent agents with the physical world and is fundamental for general robotic intelligence. Limited availability and quality of navigation data have constrained Vision-and-Language Navigation (VLN) systems' generalization and long-horizon capabilities. To address this, we curate diverse 3D scenes and develop an automated pipeline for large-scale navigation data, resulting in the GN-Matrix dataset. Building on a 3D Gaussian Splatting (3DGS) engine, we introduce a high-fidelity simulation platform supporting interactive roaming and collision-aware navigation. We further propose GN-Bench, the first BEV-based benchmark incorporating dynamic 3DGS avatars for human-robot interaction evaluation. To leverage the simulator, we develop an RL-driven navigation foundation model, Break and Establish (BAE). After supervised learning, DAgger exposes the model to rollout-induced states, breaking narrow expert-centric distributions and enabling downstream RL exploration. This unified VLN paradigm integrates map-based and map-free tasks, including instruction following, human following, and goal navigation. GN-BAE formalizes high-fidelity 3DGS-rendered Bird's Eye View representations as compact memory, unlocking latent spatial reasoning in VLMs. Extensive evaluations on GN-Bench and VLN-CE show that GN0 outperforms state-of-the-art VLN methods. Overall, GN-Matrix offers a unified framework spanning data, simulation, and learning, advancing embodied navigation in research and industrial applications.

78.5ROApr 14
DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

Sunyao Zhou, Yunzi Wu, Tianhang Wang et al.

Long-horizon collaborative vision-language navigation (VLN) is critical for multi-robot systems to accomplish complex tasks beyond the capability of a single agent. CoNavBench takes a first step by introducing the first collaborative long-horizon VLN benchmark with relay-style multi-robot tasks, a collaboration taxonomy, along with graph-grounded generation and evaluation to model handoffs and rendezvous in shared environments. However, existing benchmarks and evaluations often do not enforce strictly synchronized dual-robot rollout on a shared world timeline, and they typically rely on static coordination policies that cannot adapt when new cross-agent evidence emerges. We present Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation (DeCoNav), a decentralized framework that couples event-triggered dialogue with dynamic task allocation and replanning for real-time, adaptive coordination. In DeCoNav, robots exchange compact semantic states via dialogue without a central controller. When informative events such as new evidence, uncertainty, or conflicts arise, dialogue is triggered to dynamically reassign subgoals and replan under synchronized execution. Implemented in DeCoNavBench with 1,213 tasks across 176 HM3D scenes, DeCoNav improves the both-success rate (BSR) by 69.2%, demonstrating the effectiveness of dialogue-driven, dynamically reallocated planning for multi-robot collaboration.

HCOct 15, 2024
Generative AI's aggregated knowledge versus web-based curated knowledge

Ted Selker, Yunzi Wu

his paper explores what kinds of questions are best served by the way generative AI (GenAI) using Large Language Models(LLMs) that aggregate and package knowledge, and when traditional curated web-sourced search results serve users better. An experiment compared product searches using ChatGPT, Google search engine, or both helped us understand more about the compelling nature of generated responses. The experiment showed GenAI can speed up some explorations and decisions. We describe how search can deepen the testing of facts, logic, and context. We show where existing and emerging knowledge paradigms can help knowledge exploration in different ways. Experimenting with searches, our probes showed the value for curated web search provides for very specific, less popularly-known knowledge. GenAI excelled at bringing together knowledge for broad, relatively well-known topics. The value of curated and aggregated knowledge for different kinds of knowledge reflected in different user goals. We developed a taxonomy to distinguishing when users are best served by these two approaches.