CVJul 25, 2023

HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View

arXiv:2307.13510v327 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the challenge of 3D object detection for autonomous driving systems by providing a method that works with arbitrary camera setups without requiring additional sensors, though it is incremental as it builds on existing BEV representation approaches.

The paper tackles the problem of constructing Bird's Eye View (BEV) representations for autonomous driving using only camera data, proposing HeightFormer to explicitly model heights in BEV space without extra data like LiDAR, achieving state-of-the-art performance compared to other camera-only methods.

Vision-based Bird's Eye View (BEV) representation is an emerging perception formulation for autonomous driving. The core challenge is to construct BEV space with multi-camera features, which is a one-to-many ill-posed problem. Diving into all previous BEV representation generation methods, we found that most of them fall into two types: modeling depths in image views or modeling heights in the BEV space, mostly in an implicit way. In this work, we propose to explicitly model heights in the BEV space, which needs no extra data like LiDAR and can fit arbitrary camera rigs and types compared to modeling depths. Theoretically, we give proof of the equivalence between height-based methods and depth-based methods. Considering the equivalence and some advantages of modeling heights, we propose HeightFormer, which models heights and uncertainties in a self-recursive way. Without any extra data, the proposed HeightFormer could estimate heights in BEV accurately. Benchmark results show that the performance of HeightFormer achieves SOTA compared with those camera-only methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes