Real-time 3D Pose Estimation with a Monocular Camera Using Deep Learning and Object Priors On an Autonomous Racecar
This work addresses the problem of real-time perception for autonomous vehicles, though it appears incremental by combining existing deep learning with object priors.
The authors tackled real-time 3D pose estimation for multiple objects using a monocular camera on an autonomous racecar, achieving accurate position estimation up to 15 meters and deployment on a low-powered Jetson TX2 at speeds up to 54 km/hr.
We propose a complete pipeline that allows object detection and simultaneously estimate the pose of these multiple object instances using just a single image. A novel "keypoint regression" scheme with a cross-ratio term is introduced that exploits prior information about the object's shape and size to regress and find specific feature points. Further, a priori 3D information about the object is used to match 2D-3D correspondences and accurately estimate object positions up to a distance of 15m. A detailed discussion of the results and an in-depth analysis of the pipeline is presented. The pipeline runs efficiently on a low-powered Jetson TX2 and is deployed as part of the perception pipeline on a real-time autonomous vehicle cruising at a top speed of 54 km/hr.