RODNet: Radar Object Detection Using Cross-Modal Supervision
This addresses the need for reliable object detection in autonomous driving under adverse conditions like bad weather, though it is incremental as it builds on existing radar and fusion methods.
The paper tackles the problem of detecting objects from radar data in driving scenarios, where radar is robust but lacks semantic information, by proposing RODNet, a deep radar object detection network that uses cross-modal supervision from camera-radar fusion for training, achieving favorable performance without camera input.
Radar is usually more robust than the camera in severe driving scenarios, e.g., weak/strong lighting and bad weather. However, unlike RGB images captured by a camera, the semantic information from the radar signals is noticeably difficult to extract. In this paper, we propose a deep radar object detection network (RODNet), to effectively detect objects purely from the carefully processed radar frequency data in the format of range-azimuth frequency heatmaps (RAMaps). Three different 3D autoencoder based architectures are introduced to predict object confidence distribution from each snippet of the input RAMaps. The final detection results are then calculated using our post-processing method, called location-based non-maximum suppression (L-NMS). Instead of using burdensome human-labeled ground truth, we train the RODNet using the annotations generated automatically by a novel 3D localization method using a camera-radar fusion (CRF) strategy. To train and evaluate our method, we build a new dataset -- CRUW, containing synchronized videos and RAMaps in various driving scenarios. After intensive experiments, our RODNet shows favorable object detection performance without the presence of the camera.