Semantic Mapping with Simultaneous Object Detection and Localization
This work addresses the challenge of accurate semantic mapping for robotics applications, representing an incremental improvement over existing methods.
The paper tackles the problem of semantic mapping by simultaneously detecting objects and localizing their 6-DOF poses using a method called Contextual Temporal Mapping (CT-Map), which models inference as a Conditional Random Field with contextual and temporal potentials, and demonstrates improved object detection and pose estimation compared to baseline methods on a robot equipped with an RGB-D sensor.
We present a filtering-based method for semantic mapping to simultaneously detect objects and localize their 6 degree-of-freedom pose. For our method, called Contextual Temporal Mapping (or CT-Map), we represent the semantic map as a belief over object classes and poses across an observed scene. Inference for the semantic mapping problem is then modeled in the form of a Conditional Random Field (CRF). CT-Map is a CRF that considers two forms of relationship potentials to account for contextual relations between objects and temporal consistency of object poses, as well as a measurement potential on observations. A particle filtering algorithm is then proposed to perform inference in the CT-Map model. We demonstrate the efficacy of the CT-Map method with a Michigan Progress Fetch robot equipped with a RGB-D sensor. Our results demonstrate that the particle filtering based inference of CT-Map provides improved object detection and pose estimation with respect to baseline methods that treat observations as independent samples of a scene.