CVAug 11, 2020

Rethinking Pseudo-LiDAR Representation

arXiv:2008.04582v1230 citationsHas Code
AI Analysis

This work addresses the problem of improving monocular/stereo 3D detection for autonomous driving by clarifying a key mechanism, offering an incremental but more generalized approach.

The paper investigates the underlying mechanism of pseudo-LiDAR representations in 3D detection, finding that coordinate transformation, not data representation, drives performance, and proposes PatchNet, an image-based CNN detector that outperforms existing pseudo-LiDAR methods on the KITTI dataset.

The recently proposed pseudo-LiDAR based 3D detectors greatly improve the benchmark of monocular/stereo 3D detection task. However, the underlying mechanism remains obscure to the research community. In this paper, we perform an in-depth investigation and observe that the efficacy of pseudo-LiDAR representation comes from the coordinate transformation, instead of data representation itself. Based on this observation, we design an image based CNN detector named Patch-Net, which is more generalized and can be instantiated as pseudo-LiDAR based 3D detectors. Moreover, the pseudo-LiDAR data in our PatchNet is organized as the image representation, which means existing 2D CNN designs can be easily utilized for extracting deep features from input data and boosting 3D detection performance. We conduct extensive experiments on the challenging KITTI dataset, where the proposed PatchNet outperforms all existing pseudo-LiDAR based counterparts. Code has been made available at: https://github.com/xinzhuma/patchnet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes