MSL-RAPTOR: A 6DoF Relative Pose Tracker for Onboard Robotic Perception
This work provides a faster and more accurate method for onboard robotic perception, specifically for tracking rigid bodies with a monocular camera, which is beneficial for applications like drone-to-drone interaction.
This paper introduces MSL-RAPTOR, a monocular camera-based system for tracking the 6DoF relative pose of rigid bodies. It achieves performance comparable to RGB-D methods on the NOCS-REAL275 dataset and, when tracking a drone from another drone, is 3 times faster than comparable methods while reducing translation and rotation median errors by 66% and 23% respectively.
Determining the relative position and orientation of objects in an environment is a fundamental building block for a wide range of robotics applications. To accomplish this task efficiently in practical settings, a method must be fast, use common sensors, and generalize easily to new objects and environments. We present MSL-RAPTOR, a two-stage algorithm for tracking a rigid body with a monocular camera. The image is first processed by an efficient neural network-based front-end to detect new objects and track 2D bounding boxes between frames. The class label and bounding box is passed to the back-end that updates the object's pose using an unscented Kalman filter (UKF). The measurement posterior is fed back to the 2D tracker to improve robustness. The object's class is identified so a class-specific UKF can be used if custom dynamics and constraints are known. Adapting to track the pose of new classes only requires providing a trained 2D object detector or labeled 2D bounding box data, as well as the approximate size of the objects. The performance of MSL-RAPTOR is first verified on the NOCS-REAL275 dataset, achieving results comparable to RGB-D approaches despite not using depth measurements. When tracking a flying drone from onboard another drone, it outperforms the fastest comparable method in speed by a factor of 3, while giving lower translation and rotation median errors by 66% and 23% respectively.