CVSep 15, 2025

Towards Foundational Models for Single-Chip Radar

Tianshu Huang, Akarsh Prabhakara, Chuhan Chen, Jay Karhade, Deva Ramanan, Matthew O'Toole, Anthony Rowe

arXiv:2509.12482v114.46 citationsh-index: 9

Originality Highly original

AI Analysis

This addresses a key limitation for practitioners in automotive and indoor sensing by providing a generalizable model that reduces the need for task-specific training from scratch.

The paper tackles the problem of poor angular resolution in inexpensive single-chip mmWave radars by collecting the largest raw radar dataset (1M samples) and training a foundational model (GRT) that predicts 3D occupancy and semantic segmentation with quality comparable to higher-resolution sensors, showing logarithmic data scaling of 20% per 10x data increase.

mmWave radars are compact, inexpensive, and durable sensors that are robust to occlusions and work regardless of environmental conditions, such as weather and darkness. However, this comes at the cost of poor angular resolution, especially for inexpensive single-chip radars, which are typically used in automotive and indoor sensing applications. Although many have proposed learning-based methods to mitigate this weakness, no standardized foundational models or large datasets for the mmWave radar have emerged, and practitioners have largely trained task-specific models from scratch using relatively small datasets. In this paper, we collect (to our knowledge) the largest available raw radar dataset with 1M samples (29 hours) and train a foundational model for 4D single-chip radar, which can predict 3D occupancy and semantic segmentation with quality that is typically only possible with much higher resolution sensors. We demonstrate that our Generalizable Radar Transformer (GRT) generalizes across diverse settings, can be fine-tuned for different tasks, and shows logarithmic data scaling of 20\% per $10\times$ data. We also run extensive ablations on common design decisions, and find that using raw radar data significantly outperforms widely-used lossy representations, equivalent to a $10\times$ increase in training data. Finally, we roughly estimate that $\approx$100M samples (3000 hours) of data are required to fully exploit the potential of GRT.

View on arXiv PDF

Similar