CVAug 23, 2021

ODAM: Object Detection, Association, and Mapping using Posed RGB Video

arXiv:2108.10165v131 citations
Originality Incremental advance
AI Analysis

This work addresses 3D scene understanding for applications in Augmented Reality and Robotics, offering an incremental advance in object detection and mapping techniques.

The paper tackles 3D object localization and mapping from posed RGB videos, presenting ODAM, which uses deep learning and graph neural networks to detect and associate objects, optimizing bounding volumes with multi-view constraints, resulting in significant improvement over existing RGB-only methods on ScanNet.

Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN). Based on these frame-to-model associations, our back-end optimizes object bounding volumes, represented as super-quadrics, under multi-view geometry constraints and the object scale prior. We validate the proposed system on ScanNet where we show a significant improvement over existing RGB-only methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes