CVAIDec 6, 2022

Objects as Spatio-Temporal 2.5D points

arXiv:2212.02755v2h-index: 21
Originality Incremental advance
AI Analysis

This addresses the challenge of reducing supervision needs for 3D object localization in perception tasks, offering a more efficient solution for applications like autonomous driving, though it is incremental as it builds on existing center-point detectors.

The paper tackles the problem of estimating 3D positions of objects in bird's eye view without requiring 3D or BEV annotations at training or LiDAR at query time, by proposing a weakly supervised method that jointly learns 2D object detection and scene depth prediction, achieving comparable accuracies on the KITTI tracking benchmark while being about 10 times more computationally efficient than recent state-of-the-art approaches.

Determining accurate bird's eye view (BEV) positions of objects and tracks in a scene is vital for various perception tasks including object interactions mapping, scenario extraction etc., however, the level of supervision required to accomplish that is extremely challenging to procure. We propose a light-weight, weakly supervised method to estimate 3D position of objects by jointly learning to regress the 2D object detections and scene's depth prediction in a single feed-forward pass of a network. Our proposed method extends a center-point based single-shot object detector, and introduces a novel object representation where each object is modeled as a BEV point spatio-temporally, without the need of any 3D or BEV annotations for training and LiDAR data at query time. The approach leverages readily available 2D object supervision along with LiDAR point clouds (used only during training) to jointly train a single network, that learns to predict 2D object detection alongside the whole scene's depth, to spatio-temporally model object tracks as points in BEV. The proposed method is computationally over $\sim$10x efficient compared to recent SOTA approaches while achieving comparable accuracies on KITTI tracking benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes