DirectShape: Direct Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation
This addresses the challenge of 3D scene understanding for autonomous driving by improving vehicle pose and shape estimation, though it is incremental as it builds on prior shape alignment methods.
The paper tackles the problem of jointly estimating 3D poses and shapes of vehicles from stereo images by proposing a method that aligns shape priors directly on images using photometric and silhouette terms, showing superior performance over previous geometric approaches and boosting state-of-the-art deep learning detectors.
Scene understanding from images is a challenging problem encountered in autonomous driving. On the object level, while 2D methods have gradually evolved from computing simple bounding boxes to delivering finer grained results like instance segmentations, the 3D family is still dominated by estimating 3D bounding boxes. In this paper, we propose a novel approach to jointly infer the 3D rigid-body poses and shapes of vehicles from a stereo image pair using shape priors. Unlike previous works that geometrically align shapes to point clouds from dense stereo reconstruction, our approach works directly on images by combining a photometric and a silhouette alignment term in the energy function. An adaptive sparse point selection scheme is proposed to efficiently measure the consistency with both terms. In experiments, we show superior performance of our method on 3D pose and shape estimation over the previous geometric approach and demonstrate that our method can also be applied as a refinement step and significantly boost the performances of several state-of-the-art deep learning based 3D object detectors. All related materials and demonstration videos are available at the project page https://vision.in.tum.de/research/vslam/direct-shape.