CV ROMay 27

SAM-Enhanced Segmentation on Road Datasets: Balancing Critical Classes in Autonomous Driving

Toomas Tahves, Mauro Bellone, Junyi Gu, Raivo Sell

arXiv:2605.281365.2

AI Analysis

This work addresses the lack of pixel-level annotations in multi-modal driving datasets, providing a practical solution for researchers needing segmentation labels without manual effort.

The authors developed a SAM-based annotation pipeline to generate dense pixel-level labels for the Zenseact Open Dataset, enabling semantic segmentation research for autonomous driving. They achieved up to 48.1% mIoU with CLFT-Hybrid and 77.5% mIoU on the Iseauto platform, releasing code and annotations for reproducibility.

Dense semantic segmentation is essential for autonomous driving, yet many multi-modal datasets lack pixel-level annotations. The Zenseact Open Dataset (ZOD) provides rich multi-sensor data but only bounding-box labels, limiting its use for segmentation research. Our primary contribution is a Segment Anything Model (SAM)-based annotation pipeline that produces dense, pixel-level annotations for ZOD by converting bounding boxes into semantic masks. In this pilot study, we process over 100,000 frames and manually curate a 2,300-frame subset (36% acceptance rate) to establish a reliable baseline. Using these annotations, we evaluate transformer-based CLFT and CNN-based DeepLabV3+ architectures across diverse weather conditions, achieving up to 48.1% mIoU with CLFT-Hybrid. To address extreme class imbalance, where pedestrians, cyclists, and signs constitute less than 1% of pixels, we explore specialized models targeting rare classes. We further validate the pipeline on the Iseauto autonomous-vehicle platform, achieving 77.5% mIoU, and show that SAM-derived representations transfer effectively across sensor configurations via bidirectional transfer learning. All code and annotations are released to support reproducible research.

View on arXiv PDF

Similar