CVApr 28, 2022Code
TJ4DRadSet: A 4D Radar Dataset for Autonomous DrivingLianqing Zheng, Zhixiong Ma, Xichan Zhu et al.
The next-generation high-resolution automotive radar (4D radar) can provide additional elevation measurement and denser point clouds, which has great potential for 3D sensing in autonomous driving. In this paper, we introduce a dataset named TJ4DRadSet with 4D radar points for autonomous driving research. The dataset was collected in various driving scenarios, with a total of 7757 synchronized frames in 44 consecutive sequences, which are well annotated with 3D bounding boxes and track ids. We provide a 4D radar-based 3D object detection baseline for our dataset to demonstrate the effectiveness of deep learning methods for 4D radar point clouds. The dataset can be accessed via the following link: https://github.com/TJRadarLab/TJ4DRadSet.
CVJan 26, 2025Code
MetaOcc: Spatio-Temporal Fusion of Surround-View 4D Radar and Camera for 3D Occupancy Prediction with Dual Training StrategiesLong Yang, Lianqing Zheng, Wenjin Ai et al.
Robust 3D occupancy prediction is essential for autonomous driving, particularly under adverse weather conditions where traditional vision-only systems struggle. While the fusion of surround-view 4D radar and cameras offers a promising low-cost solution, effectively extracting and integrating features from these heterogeneous sensors remains challenging. This paper introduces MetaOcc, a novel multi-modal framework for omnidirectional 3D occupancy prediction that leverages both multi-view 4D radar and images. To address the limitations of directly applying LiDAR-oriented encoders to sparse radar data, we propose a Radar Height Self-Attention module that enhances vertical spatial reasoning and feature extraction. Additionally, a Hierarchical Multi-scale Multi-modal Fusion strategy is developed to perform adaptive local-global fusion across modalities and time, mitigating spatio-temporal misalignments and enriching fused feature representations. To reduce reliance on expensive point cloud annotations, we further propose a pseudo-label generation pipeline based on an open-set segmentor. This enables a semi-supervised strategy that achieves 90% of the fully supervised performance using only 50% of the ground truth labels, offering an effective trade-off between annotation cost and accuracy. Extensive experiments demonstrate that MetaOcc under full supervision achieves state-of-the-art performance, outperforming previous methods by +0.47 SC IoU and +4.02 mIoU on the OmniHD-Scenes dataset, and by +1.16 SC IoU and +1.24 mIoU on the SurroundOcc-nuScenes dataset. These results demonstrate the scalability and robustness of MetaOcc across sensor domains and training conditions, paving the way for practical deployment in real-world autonomous systems. Code and data are available at https://github.com/LucasYang567/MetaOcc.
CVJan 26, 2025
Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional PerceptionLianqing Zheng, Jianan Liu, Runwei Guan et al.
3D object detection and occupancy prediction are critical tasks in autonomous driving, attracting significant attention. Despite the potential of recent vision-based methods, they encounter challenges under adverse conditions. Thus, integrating cameras with next-generation 4D imaging radar to achieve unified multi-task perception is highly significant, though research in this domain remains limited. In this paper, we propose Doracamom, the first framework that fuses multi-view cameras and 4D radar for joint 3D object detection and semantic occupancy prediction, enabling comprehensive environmental perception. Specifically, we introduce a novel Coarse Voxel Queries Generator that integrates geometric priors from 4D radar with semantic features from images to initialize voxel queries, establishing a robust foundation for subsequent Transformer-based refinement. To leverage temporal information, we design a Dual-Branch Temporal Encoder that processes multi-modal temporal features in parallel across BEV and voxel spaces, enabling comprehensive spatio-temporal representation learning. Furthermore, we propose a Cross-Modal BEV-Voxel Fusion module that adaptively fuses complementary features through attention mechanisms while employing auxiliary tasks to enhance feature quality. Extensive experiments on the OmniHD-Scenes, View-of-Delft (VoD), and TJ4DRadSet datasets demonstrate that Doracamom achieves state-of-the-art performance in both tasks, establishing new benchmarks for multi-modal 3D perception. Code and models will be publicly available.
CVDec 14, 2024
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous DrivingLianqing Zheng, Long Yang, Qunshu Lin et al.
The rapid advancement of deep learning has intensified the need for comprehensive data for use by autonomous driving algorithms. High-quality datasets are crucial for the development of effective data-driven autonomous driving solutions. Next-generation autonomous driving datasets must be multimodal, incorporating data from advanced sensors that feature extensive data coverage, detailed annotations, and diverse scene representation. To address this need, we present OmniHD-Scenes, a large-scale multimodal dataset that provides comprehensive omnidirectional high-definition data. The OmniHD-Scenes dataset combines data from 128-beam LiDAR, six cameras, and six 4D imaging radar systems to achieve full environmental perception. The dataset comprises 1501 clips, each approximately 30-s long, totaling more than 450K synchronized frames and more than 5.85 million synchronized sensor data points. We also propose a novel 4D annotation pipeline. To date, we have annotated 200 clips with more than 514K precise 3D bounding boxes. These clips also include semantic segmentation annotations for static scene elements. Additionally, we introduce a novel automated pipeline for generation of the dense occupancy ground truth, which effectively leverages information from non-key frames. Alongside the proposed dataset, we establish comprehensive evaluation metrics, baseline models, and benchmarks for 3D detection and semantic occupancy prediction. These benchmarks utilize surround-view cameras and 4D imaging radar to explore cost-effective sensor solutions for autonomous driving applications. Extensive experiments demonstrate the effectiveness of our low-cost sensor configuration and its robustness under adverse conditions. Data will be released at https://www.2077ai.com/OmniHD-Scenes.