Dual Quadrics from Object Detection BoundingBoxes as Landmark Representations in SLAM
This work addresses the problem of enabling robots to build semantic maps with object-level landmarks, which is incremental as it adapts existing computer vision techniques for robotics applications.
The paper tackles the challenge of integrating semantic object-level landmarks into SLAM by deriving a formulation that uses dual quadrics as 3D landmark representations, showing that 2D bounding boxes from object detection can directly constrain these parameters, and demonstrates joint estimation of robot pose and quadrics in factor graph SLAM with a monocular camera.
Research in Simultaneous Localization And Mapping (SLAM) is increasingly moving towards richer world representations involving objects and high level features that enable a semantic model of the world for robots, potentially leading to a more meaningful set of robot-world interactions. Many of these advances are grounded in state-of-the-art computer vision techniques primarily developed in the context of image-based benchmark datasets, leaving several challenges to be addressed in adapting them for use in robotics. In this paper, we derive a formulation for Simultaneous Localization And Mapping (SLAM) that uses dual quadrics as 3D landmark representations, and show how 2D bounding boxes (such as those typically obtained from visual object detection systems) can directly constrain the quadric parameters. Our paper demonstrates how to jointly estimate the robot pose and dual quadric parameters in factor graph based SLAM with a general perspective camera, and covers the use-cases of a robot moving with a monocular camera with and without the availability of additional depth information.