CVAILGROJun 21, 2022

A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search

arXiv:2206.13396v212 citationsh-index: 119
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently rearranging objects based on visual input for embodied agents, representing a strong specific gain in performance.

The paper tackles the problem of visual room rearrangement for embodied agents by proposing a simple method that improves correct rearrangement from 0.53% to 16.56% on the AI2-THOR Rearrangement Challenge, using only 2.7% as many environment samples compared to prior methods.

Physically rearranging objects is an important capability for embodied agents. Visual room rearrangement evaluates an agent's ability to rearrange objects in a room to a desired goal based solely on visual input. We propose a simple yet effective method for this problem: (1) search for and map which objects need to be rearranged, and (2) rearrange each object until the task is complete. Our approach consists of an off-the-shelf semantic segmentation model, voxel-based semantic map, and semantic search policy to efficiently find objects that need to be rearranged. On the AI2-THOR Rearrangement Challenge, our method improves on current state-of-the-art end-to-end reinforcement learning-based methods that learn visual rearrangement policies from 0.53% correct rearrangement to 16.56%, using only 2.7% as many samples from the environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes