CVSep 6, 2017

Learning Dilation Factors for Semantic Segmentation of Street Scenes

arXiv:1709.01956v18 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of balancing fine details and receptive fields in semantic segmentation for autonomous driving and urban analysis, though it is incremental as it builds on existing dilated convolution methods.

The paper tackles the problem of optimizing dilation parameters in convolutional neural networks for semantic segmentation of street scenes, which are typically hand-tuned and fixed, by learning them adaptively per channel, resulting in consistent improvements on datasets such as Cityscapes and Camvid.

Contextual information is crucial for semantic segmentation. However, finding the optimal trade-off between keeping desired fine details and at the same time providing sufficiently large receptive fields is non trivial. This is even more so, when objects or classes present in an image significantly vary in size. Dilated convolutions have proven valuable for semantic segmentation, because they allow to increase the size of the receptive field without sacrificing image resolution. However, in current state-of-the-art methods, dilation parameters are hand-tuned and fixed. In this paper, we present an approach for learning dilation parameters adaptively per channel, consistently improving semantic segmentation results on street-scene datasets like Cityscapes and Camvid.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes