Leveraging Transformer Decoder for Automotive Radar Object Detection
This work addresses object detection in automotive radar systems, offering a method that eliminates dense proposal generation and heuristic post-processing, though it appears incremental as it builds on existing Transformer and set prediction ideas.
The paper tackles 3D radar object detection by proposing a Transformer-based architecture with a novel decoder and Pyramid Token Fusion, achieving significant improvements over state-of-the-art radar-only baselines on the RADDet dataset.
In this paper, we present a Transformer-based architecture for 3D radar object detection that uses a novel Transformer Decoder as the prediction head to directly regress 3D bounding boxes and class scores from radar feature representations. To bridge multi-scale radar features and the decoder, we propose Pyramid Token Fusion (PTF), a lightweight module that converts a feature pyramid into a unified, scale-aware token sequence. By formulating detection as a set prediction problem with learnable object queries and positional encodings, our design models long-range spatial-temporal correlations and cross-feature interactions. This approach eliminates dense proposal generation and heuristic post-processing such as extensive non-maximum suppression (NMS) tuning. We evaluate the proposed framework on the RADDet, where it achieves significant improvements over state-of-the-art radar-only baselines.