ROAIApr 24, 2023

USA-Net: Unified Semantic and Affordance Representations for Robot Memory

arXiv:2304.12164v216 citationsh-index: 35
Originality Highly original
AI Analysis

This addresses the challenge of suboptimal performance in robotic navigation due to separate pipelines for scene geometry and semantics, offering a novel approach for open-ended instruction following.

The paper tackles the problem of enabling robots to follow open-ended instructions by developing USA-Net, a method that unifies semantic and affordance representations in a differentiable map, resulting in trajectories that are 5-10% shorter and 10-30% closer to goal queries than comparable planners.

In order for robots to follow open-ended instructions like "go open the brown cabinet over the sink", they require an understanding of both the scene geometry and the semantics of their environment. Robotic systems often handle these through separate pipelines, sometimes using very different representation spaces, which can be suboptimal when the two objectives conflict. In this work, we present USA-Net, a simple method for constructing a world representation that encodes both the semantics and spatial affordances of a scene in a differentiable map. This allows us to build a gradient-based planner which can navigate to locations in the scene specified using open-ended vocabulary. We use this planner to consistently generate trajectories which are both shorter 5-10% shorter and 10-30% closer to our goal query in CLIP embedding space than paths from comparable grid-based planners which don't leverage gradient information. To our knowledge, this is the first end-to-end differentiable planner optimizes for both semantics and affordance in a single implicit map. Code and visuals are available at our website: https://usa.bolte.cc/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes