CVDec 9, 2020

MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from Monocular RGB Videos

arXiv:2012.05360v225 citations
AI Analysis

MOLTR addresses the problem of building object-centric maps with semantic and geometric information for future robotic and AR/VR applications, offering an incremental improvement over existing methods.

This paper introduces MOLTR, a system that performs online multiple object localization, tracking, and reconstruction from monocular RGB videos and camera poses. It achieves this by localizing objects with a monocular 3D detector, tracking their motion states with a Bayesian filter, and refining their shapes by fusing learned shape codes, demonstrating superior performance on benchmarking datasets.

Semantic aware reconstruction is more advantageous than geometric-only reconstruction for future robotic and AR/VR applications because it represents not only where things are, but also what things are. Object-centric mapping is a task to build an object-level reconstruction where objects are separate and meaningful entities that convey both geometry and semantic information. In this paper, we present MOLTR, a solution to object-centric mapping using only monocular image sequences and camera poses. It is able to localise, track, and reconstruct multiple objects in an online fashion when an RGB camera captures a video of the surrounding. Given a new RGB frame, MOLTR firstly applies a monocular 3D detector to localise objects of interest and extract their shape codes that represent the object shapes in a learned embedding space. Detections are then merged to existing objects in the map after data association. Motion state (i.e. kinematics and the motion status) of each object is tracked by a multiple model Bayesian filter and object shape is progressively refined by fusing multiple shape code. We evaluate localisation, tracking, and reconstruction on benchmarking datasets for indoor and outdoor scenes, and show superior performance over previous approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes