CVMay 25, 2023

Learning Occupancy for Monocular 3D Object Detection

arXiv:2305.15694v147 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of accurate 3D detection from single images for autonomous driving, representing an incremental improvement over existing methods.

The paper tackles monocular 3D object detection by proposing OccupancyM3D, which learns occupancy in frustum and 3D space to improve feature extraction, achieving state-of-the-art results on KITTI and Waymo datasets with significant performance gains.

Monocular 3D detection is a challenging task due to the lack of accurate 3D information. Existing approaches typically rely on geometry constraints and dense depth estimates to facilitate the learning, but often fail to fully exploit the benefits of three-dimensional feature extraction in frustum and 3D space. In this paper, we propose \textbf{OccupancyM3D}, a method of learning occupancy for monocular 3D detection. It directly learns occupancy in frustum and 3D space, leading to more discriminative and informative 3D features and representations. Specifically, by using synchronized raw sparse LiDAR point clouds, we define the space status and generate voxel-based occupancy labels. We formulate occupancy prediction as a simple classification problem and design associated occupancy losses. Resulting occupancy estimates are employed to enhance original frustum/3D features. As a result, experiments on KITTI and Waymo open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin. Codes and pre-trained models will be available at \url{https://github.com/SPengLiang/OccupancyM3D}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes