CVMay 11, 2020

FroDO: From Detections to 3D Objects

arXiv:2005.05125v179 citations
Originality Incremental advance
AI Analysis

This addresses the need for object-oriented maps in scene understanding for applications like robotics and augmented reality, representing an incremental improvement with a novel hybrid approach.

The paper tackles the problem of accurate 3D reconstruction of object instances from RGB video by introducing FroDO, a method that infers object location, pose, and shape in a coarse-to-fine manner, achieving state-of-the-art results on datasets like Pix3D, Redwood-OS, and ScanNet.

Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel learnt space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a category-aware 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse and dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes