CVMar 9

On the Feasibility and Opportunity of Autoregressive 3D Object Detection

arXiv:2603.07985v1
Predicted impact top 40% in CV · last 90 daysOriginality Highly original
AI Analysis

This work provides a viable and flexible alternative for LiDAR-based 3D object detection, simplifying the architecture for researchers and practitioners by removing complex hand-crafted components.

The paper introduces AutoReg3D, an autoregressive 3D object detector that rephrases detection as sequence generation. It achieves competitive performance on nuScenes without relying on hand-crafted components like anchors or NMS, by emitting objects in a range-causal order.

LiDAR-based 3D object detectors typically rely on proposal heads with hand-crafted components like anchor assignment and non-maximum suppression (NMS), complicating training and limiting extensibility. We present AutoReg3D, an autoregressive 3D detector that casts detection as sequence generation. Given point-cloud features, AutoReg3D emits objects in a range-causal (near-to-far) order and encodes each object as a short, discrete-token sequence consisting of its center, size, orientation, velocity, and class. This near-to-far ordering mirrors LiDAR geometry--near objects occlude far ones but not vice versa--enabling straightforward teacher forcing during training and autoregressive decoding at test time. AutoReg3D is compatible across diverse point-cloud or backbones and attains competitive nuScenes performance without anchors or NMS. Beyond parity, the sequential formulation unlocks language-model advances for 3D perception, including GRPO-style reinforcement learning for task-aligned objectives. These results position autoregressive decoding as a viable, flexible alternative for LiDAR-based detection and open a path to importing modern sequence-modeling tools into 3D perception.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes