CVCLLGIVDec 18, 2019

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

arXiv:1912.08830v3579 citations
Originality Highly original
AI Analysis

This addresses the challenge of precisely locating objects in 3D scenes based on natural language queries for applications like robotics and augmented reality, representing a novel task rather than an incremental improvement.

The paper tackles the problem of 3D object localization in RGB-D scans using natural language descriptions by proposing ScanRefer, which learns a fused descriptor from 3D object proposals and sentence embeddings to regress bounding boxes, and introduces a dataset with 51,583 descriptions of 11,046 objects from 800 scenes.

We introduce the task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, learning a fused descriptor from 3D object proposals and encoded sentence embeddings. This fused descriptor correlates language expressions with geometric features, enabling regression of the 3D bounding box of a target object. We also introduce the ScanRefer dataset, containing 51,583 descriptions of 11,046 objects from 800 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes