CVROMay 27

SAM-Enhanced Segmentation on Road Datasets: Balancing Critical Classes in Autonomous Driving

arXiv:2605.281365.2
AI Analysis

This work addresses the lack of pixel-level annotations in multi-modal driving datasets, providing a practical solution for researchers needing segmentation labels without manual effort.

The authors developed a SAM-based annotation pipeline to generate dense pixel-level labels for the Zenseact Open Dataset, enabling semantic segmentation research for autonomous driving. They achieved up to 48.1% mIoU with CLFT-Hybrid and 77.5% mIoU on the Iseauto platform, releasing code and annotations for reproducibility.

Dense semantic segmentation is essential for autonomous driving, yet many multi-modal datasets lack pixel-level annotations. The Zenseact Open Dataset (ZOD) provides rich multi-sensor data but only bounding-box labels, limiting its use for segmentation research. Our primary contribution is a Segment Anything Model (SAM)-based annotation pipeline that produces dense, pixel-level annotations for ZOD by converting bounding boxes into semantic masks. In this pilot study, we process over 100,000 frames and manually curate a 2,300-frame subset (36% acceptance rate) to establish a reliable baseline. Using these annotations, we evaluate transformer-based CLFT and CNN-based DeepLabV3+ architectures across diverse weather conditions, achieving up to 48.1% mIoU with CLFT-Hybrid. To address extreme class imbalance, where pedestrians, cyclists, and signs constitute less than 1% of pixels, we explore specialized models targeting rare classes. We further validate the pipeline on the Iseauto autonomous-vehicle platform, achieving 77.5% mIoU, and show that SAM-derived representations transfer effectively across sensor configurations via bidirectional transfer learning. All code and annotations are released to support reproducible research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes