CVROJun 5, 2023

Scene as Occupancy

Peking U
arXiv:2306.02851v3260 citationsh-index: 87
Originality Incremental advance
AI Analysis

This addresses the need for precise perception in autonomous driving by providing a fine-grained scene representation, though it is incremental as it builds on existing occupancy concepts with a new benchmark and pipeline.

The paper tackles the problem of representing complex traffic scenes for autonomous driving by introducing a 3D occupancy representation, which quantizes scenes into structured grids with semantic labels, and proposes OccNet to reconstruct this representation from multi-view images. The result shows performance gains across tasks, such as a 15%-58% reduction in collision rates for motion planning.

Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occupancy is that it could capture the fine-grained details of critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior or concurrent literature mainly concentrate on a single scene completion task, where we might argue that the potential of this occupancy representation might obsess broader impact. In this paper, we propose OccNet, a multi-view vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. At the core of OccNet is a general occupancy embedding to represent 3D physical world. Such a descriptor could be applied towards a wide span of driving tasks, including detection, segmentation and planning. To validate the effectiveness of this new representation and our proposed algorithm, we propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes. Empirical experiments show that there are evident performance gain across multiple tasks, e.g., motion planning could witness a collision rate reduction by 15%-58%, demonstrating the superiority of our method.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes