CVROJun 22, 2018

Visual-Inertial Object Detection and Mapping

arXiv:1806.08498v222 citations
AI Analysis

This work addresses the challenge of real-time object detection and mapping for robotics or augmented reality applications, representing an incremental improvement by integrating deep networks with nonlinear filtering.

The paper tackles the problem of populating an unknown environment with models of previously seen objects using monocular video and inertial sensors, resulting in a system that returns sparse point clouds for unrecognized regions and detailed object models with poses in a Euclidean frame.

We present a method to populate an unknown environment with models of previously seen objects, placed in a Euclidean reference frame that is inferred causally and on-line using monocular video along with inertial sensors. The system we implement returns a sparse point cloud for the regions of the scene that are visible but not recognized as a previously seen object, and a detailed object model and its pose in the Euclidean frame otherwise. The system includes bottom-up and top-down components, whereby deep networks trained for detection provide likelihood scores for object hypotheses provided by a nonlinear filter, whose state serves as memory. Additional networks provide likelihood scores for edges, which complements detection networks trained to be invariant to small deformations. We test our algorithm on existing datasets, and also introduce the VISMA dataset, that provides ground truth pose, point-cloud map, and object models, along with time-stamped inertial measurements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes