CVLGMar 15, 2018

Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation

arXiv:1803.05675v267 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of leveraging diverse datasets for autonomous vehicle perception, though it appears incremental as it builds on existing hierarchical classifier concepts.

The paper tackles the problem of training convolutional networks on multiple heterogeneous street scene datasets for semantic segmentation, achieving improvements in mean pixel accuracy of 13.0% for Cityscapes, 2.4% for Vistas, and 32.3% for GTSDB classes compared to flat classifiers.

We propose a convolutional network with hierarchical classifiers for per-pixel semantic segmentation, which is able to be trained on multiple, heterogeneous datasets and exploit their semantic hierarchy. Our network is the first to be simultaneously trained on three different datasets from the intelligent vehicles domain, i.e. Cityscapes, GTSDB and Mapillary Vistas, and is able to handle different semantic level-of-detail, class imbalances, and different annotation types, i.e. dense per-pixel and sparse bounding-box labels. We assess our hierarchical approach, by comparing against flat, non-hierarchical classifiers and we show improvements in mean pixel accuracy of 13.0% for Cityscapes classes and 2.4% for Vistas classes and 32.3% for GTSDB classes. Our implementation achieves inference rates of 17 fps at a resolution of 520x706 for 108 classes running on a GPU.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes