CVMay 26, 2023

BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy

arXiv:2305.16829v25 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in 3D object detection for autonomous driving by improving geometric representation, though it is incremental as it builds on existing BEV paradigms.

The paper tackles the problem of sparse 3D representations in bird's-eye-view detection by enhancing BEV with instance occupancy information, resulting in a method that outperforms state-of-the-art with only a 0.2% parameter and 0.24% GFLOPs increase.

A popular approach for constructing bird's-eye-view (BEV) representation in 3D detection is to lift 2D image features onto the viewing frustum space based on explicitly predicted depth distribution. However, depth distribution can only characterize the 3D geometry of visible object surfaces but fails to capture their internal space and overall geometric structure, leading to sparse and unsatisfactory 3D representations. To mitigate this issue, we present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information. At the core of our method is the newly-designed instance occupancy prediction (IOP) module, which aims to infer point-level occupancy status for each instance in the frustum space. To ensure training efficiency while maintaining representational flexibility, it is trained using the combination of both explicit and implicit supervision. With the predicted occupancy, we further design a geometry-aware feature propagation mechanism (GFP), which performs self-attention based on occupancy distribution along each ray in frustum and is able to enforce instance-level feature consistency. By integrating the IOP module with GFP mechanism, our BEV-IO detector is able to render highly informative 3D scene structures with more comprehensive BEV representations. Experimental results demonstrate that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters (0.2%) and computational overhead (0.24%in GFLOPs).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes