CVApr 17, 2025

RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding

arXiv:2504.12643v38 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This incremental improvement enhances temporal modeling for autonomous driving systems using camera data.

The paper tackled the problem of velocity estimation as a bottleneck in camera-only 3D detection, improving the StreamPETR framework to achieve a state-of-the-art NuScenes Detection Score of 70.86%.

This technical report introduces a targeted improvement to the StreamPETR framework, specifically aimed at enhancing velocity estimation, a critical factor influencing the overall NuScenes Detection Score. While StreamPETR exhibits strong 3D bounding box detection performance as reflected by its high mean Average Precision our analysis identified velocity estimation as a substantial bottleneck when evaluated on the NuScenes dataset. To overcome this limitation, we propose a customized positional embedding strategy tailored to enhance temporal modeling capabilities. Experimental evaluations conducted on the NuScenes test set demonstrate that our improved approach achieves a state-of-the-art NDS of 70.86% using the ViT-L backbone, setting a new benchmark for camera-only 3D object detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes