36.9CVMar 28
Class-Distribution Guided Active Learning for 3D Occupancy Prediction in Autonomous DrivingWonjune Kim, In-Jae Lee, Sihwan Hwang et al.
3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant classes is inefficient. To address these challenges, we propose a class-distribution guided active learning framework for selecting training samples to annotate in autonomous driving datasets. Our approach combines three complementary criteria to select the training samples. Inter-sample diversity prioritizes samples whose predicted class distributions differ from those of the labeled set, intra-set diversity prevents redundant sampling within each acquisition cycle, and frequency-weighted uncertainty emphasizes rare classes by reweighting voxel-level entropy with inverse per-sample class proportions. We ensure evaluation validity by using a geographically disjoint train/validation split of Occ3D-nuScenes, which reduces train-validation overlap and mitigates potential map memorization. With only 42.4% labeled data, our framework reaches 26.62 mIoU, comparable to full supervision and outperforming active learning baselines at the same budget. We further validate generality on SemanticKITTI using a different architecture, demonstrating consistent effectiveness across datasets.
CVSep 6, 2025
CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View TransformationIn-Jae Lee, Sihwan Hwang, Youngseok Kim et al.
Recently, camera-radar fusion-based 3D object detection methods in bird's eye view (BEV) have gained attention due to the complementary characteristics and cost-effectiveness of these sensors. Previous approaches using forward projection struggle with sparse BEV feature generation, while those employing backward projection overlook depth ambiguity, leading to false positives. In this paper, to address the aforementioned limitations, we propose a novel camera-radar fusion-based 3D object detection and segmentation model named CRAB (Camera-Radar fusion for reducing depth Ambiguity in Backward projection-based view transformation), using a backward projection that leverages radar to mitigate depth ambiguity. During the view transformation, CRAB aggregates perspective view image context features into BEV queries. It improves depth distinction among queries along the same ray by combining the dense but unreliable depth distribution from images with the sparse yet precise depth information from radar occupancy. We further introduce spatial cross-attention with a feature map containing radar context information to enhance the comprehension of the 3D scene. When evaluated on the nuScenes open dataset, our proposed approach achieves a state-of-the-art performance among backward projection-based camera-radar fusion methods with 62.4\% NDS and 54.0\% mAP in 3D object detection.
CVMay 17, 2025
Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Boosting Off-Road Segmentation via Photometric Distortion and Exponential Moving AverageWonjune Kim, Lae-kyoung Lee, Su-Yong An
We report on the application of a high-capacity semantic segmentation pipeline to the GOOSE 2D Semantic Segmentation Challenge for unstructured off-road environments. Using a FlashInternImage-B backbone together with a UPerNet decoder, we adapt established techniques, rather than designing new ones, to the distinctive conditions of off-road scenes. Our training recipe couples strong photometric distortion augmentation (to emulate the wide lighting variations of outdoor terrain) with an Exponential Moving Average (EMA) of weights for better generalization. Using only the GOOSE training dataset, we achieve 88.8\% mIoU on the validation set.