CVAug 29, 2019

StarNet: Targeted Computation for Object Detection in Point Clouds

arXiv:1908.11069v3128 citations
Originality Highly original
AI Analysis

This addresses the problem of efficient and accurate object detection in sparse 3D data for autonomous vehicles, representing a novel method rather than an incremental improvement.

The paper tackles 3D object detection in LiDAR point clouds for self-driving cars by introducing StarNet, a point-based system that avoids convolutional approaches and uses local information and sampling, achieving over 7 absolute mAP improvement on pedestrian detection in the Waymo Open Dataset while being more computationally efficient.

Detecting objects from LiDAR point clouds is an important component of self-driving car technology as LiDAR provides high resolution spatial information. Previous work on point-cloud 3D object detection has re-purposed convolutional approaches from traditional camera imagery. In this work, we present an object detection system called StarNet designed specifically to take advantage of the sparse and 3D nature of point cloud data. StarNet is entirely point-based, uses no global information, has data dependent anchors, and uses sampling instead of learned region proposals. We demonstrate how this design leads to competitive or superior performance on the large Waymo Open Dataset and the KITTI detection dataset, as compared to convolutional baselines. In particular, we show how our detector can outperform a competitive baseline on Pedestrian detection on the Waymo Open Dataset by more than 7 absolute mAP while being more computationally efficient. We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics. Finally, we show how our design allows for incorporating temporal context by using detections from previous frames to target computation of the detector, which leads to further improvements in performance without additional computational cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes