CVApr 8, 2024

MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues

arXiv:2404.05280v15 citationsh-index: 6
Originality Incremental advance
AI Analysis

This improves autonomous driving perception by addressing occlusion and range limitations, though it is an incremental advance focusing on a specific domain.

The paper tackled 3D object detection from roadside cameras by incorporating scene cues like road surface height, resulting in state-of-the-art performance that surpasses existing methods by a large margin on two benchmarks.

3D object detection based on roadside cameras is an additional way for autonomous driving to alleviate the challenges of occlusion and short perception range from vehicle cameras. Previous methods for roadside 3D object detection mainly focus on modeling the depth or height of objects, neglecting the stationary of cameras and the characteristic of inter-frame consistency. In this work, we propose a novel framework, namely MOSE, for MOnocular 3D object detection with Scene cuEs. The scene cues are the frame-invariant scene-specific features, which are crucial for object localization and can be intuitively regarded as the height between the surface of the real road and the virtual ground plane. In the proposed framework, a scene cue bank is designed to aggregate scene cues from multiple frames of the same scene with a carefully designed extrinsic augmentation strategy. Then, a transformer-based decoder lifts the aggregated scene cues as well as the 3D position embeddings for 3D object location, which boosts generalization ability in heterologous scenes. The extensive experiment results on two public benchmarks demonstrate the state-of-the-art performance of the proposed method, which surpasses the existing methods by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes