CVAug 17, 2023

Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language

arXiv:2308.08854v110 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses the challenge of querying 3D environments with natural language for robotics or AR/VR applications, representing an incremental improvement over existing methods.

The paper tackles the problem of visual navigation by enhancing a neural radiance field map with language capabilities, enabling natural language queries for object search without extra labeled data, achieving effectiveness in single and multi-object searches.

We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance map for Visual Navigation with natural language query prompts. The recently proposed RNR-Map employs a grid structure comprising latent codes positioned at each pixel. These latent codes, which are derived from image observation, enable: i) image rendering given a camera pose, since they are converted to Neural Radiance Field; ii) image navigation and localization with astonishing accuracy. On top of this, we enhance RNR-Map with CLIP-based embedding latent codes, allowing natural language search without additional label data. We evaluate the effectiveness of this map in single and multi-object searches. We also investigate its compatibility with a Large Language Model as an "affordance query resolver". Code and videos are available at https://intelligolabs.github.io/Le-RNR-Map/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes