R2S100K: Road-Region Segmentation Dataset For Semi-Supervised Autonomous Driving in the Wild
This addresses the lack of datasets for unstructured roadways in autonomous driving, though it is incremental as it builds on existing segmentation methods with a new dataset and sampling framework.
The authors tackled the problem of semantic road segmentation for autonomous driving by introducing R2S100K, a large-scale dataset with 100K images including 14,000 labeled and 86,000 unlabeled examples, which improved generalizability and reduced labeling costs in experiments.
Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation dataset (R2S100K) -- a large-scale dataset and benchmark for training and evaluation of road segmentation in aforementioned challenging unstructured roadways. R2S100K comprises 100K images extracted from a large and diverse set of video sequences covering more than 1000 KM of roadways. Out of these 100K privacy respecting images, 14,000 images have fine pixel-labeling of road regions, with 86,000 unlabeled images that can be leveraged through semi-supervised learning methods. Alongside, we present an Efficient Data Sampling (EDS) based self-training framework to improve learning by leveraging unlabeled data. Our experimental results demonstrate that the proposed method significantly improves learning methods in generalizability and reduces the labeling cost for semantic segmentation tasks. Our benchmark will be publicly available to facilitate future research at https://r2s100k.github.io/.