ROCVLGMar 12, 2024

Learning Generalizable Feature Fields for Mobile Manipulation

arXiv:2403.07563v249 citationsh-index: 16IROS
AI Analysis

This addresses a key challenge in robotics for enabling more versatile and efficient robot operations, though it appears incremental as it builds on existing neural field and CLIP techniques.

The paper tackles the problem of creating a unified representation for both navigation and manipulation in mobile robotics by introducing GeFF, a generalizable neural feature field that enables real-time performance, outperforming point-based baselines in runtime and storage-accuracy trade-offs.

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner so that robots can use both for navigation and manipulation. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherent at an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We quantitatively evaluate GeFF's ability for open-vocabulary object-/part-level manipulation and show that GeFF outperforms point-based baselines in runtime and storage-accuracy trade-offs, with qualitative examples of semantics-aware navigation and articulated object manipulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes