CVApr 20, 2020

Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

arXiv:2004.09305v121 citations
AI Analysis

This work addresses the problem of accurate and consistent 3D object tracking for autonomous driving, though it appears incremental as it builds on existing deep learning and optimization techniques.

The paper tackles the challenge of 3D object tracking from stereo images by combining deep learning for object understanding with geometric optimization for consistent motion estimation, achieving significant performance improvements over previous methods on the KITTI tracking dataset.

Directly learning multiple 3D objects motion from sequential images is difficult, while the geometric bundle adjustment lacks the ability to localize the invisible object centroid. To benefit from both the powerful object understanding skill from deep neural network meanwhile tackle precise geometry modeling for consistent trajectory estimation, we propose a joint spatial-temporal optimization-based stereo 3D object tracking method. From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box. Dense object cues (local depth and local coordinates) that associating to the object centroid are then predicted using a region-based network. Considering both the instant localization accuracy and motion consistency, our optimization models the relations between the object centroid and observed cues into a joint spatial-temporal error function. All historic cues will be summarized to contribute to the current estimation by a per-frame marginalization strategy without repeated computation. Quantitative evaluation on the KITTI tracking dataset shows our approach outperforms previous image-based 3D tracking methods by significant margins. We also report extensive results on multiple categories and larger datasets (KITTI raw and Argoverse Tracking) for future benchmarking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes