ROLGOct 28, 2023

Bird's Eye View Based Pretrained World model for Visual Navigation

arXiv:2310.18847v21 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge of transferring navigation models from simulation to reality for robotics, though it appears incremental by fusing existing components.

The paper tackles the problem of sim2real transfer for visual navigation by proposing a system that uses Bird's Eye View (BEV) representations as an intermediary, trained entirely in a simulator and zero-shot transferring to the real world, with deployment on a differential drive robot showing effectiveness.

Sim2Real transfer has gained popularity because it helps transfer from inexpensive simulators to real world. This paper presents a novel system that fuses components in a traditional World Model into a robust system, trained entirely within a simulator, that Zero-Shot transfers to the real world. To facilitate transfer, we use an intermediary representation that is based on \textit{Bird's Eye View (BEV)} images. Thus, our robot learns to navigate in a simulator by first learning to translate from complex \textit{First-Person View (FPV)} based RGB images to BEV representations, then learning to navigate using those representations. Later, when tested in the real world, the robot uses the perception model that translates FPV-based RGB images to embeddings that were learned by the FPV to BEV translator and that can be used by the downstream policy. The incorporation of state-checking modules using \textit{Anchor images} and Mixture Density LSTM not only interpolates uncertain and missing observations but also enhances the robustness of the model in the real-world. We trained the model using data from a Differential drive robot in the CARLA simulator. Our methodology's effectiveness is shown through the deployment of trained models onto a real-world Differential drive robot. Lastly we release a comprehensive codebase, dataset and models for training and deployment (\url{https://sites.google.com/view/value-explicit-pretraining}).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes