Learning Dilation Factors for Semantic Segmentation of Street Scenes
This work addresses the challenge of balancing fine details and receptive fields in semantic segmentation for autonomous driving and urban analysis, though it is incremental as it builds on existing dilated convolution methods.
The paper tackles the problem of optimizing dilation parameters in convolutional neural networks for semantic segmentation of street scenes, which are typically hand-tuned and fixed, by learning them adaptively per channel, resulting in consistent improvements on datasets such as Cityscapes and Camvid.
Contextual information is crucial for semantic segmentation. However, finding the optimal trade-off between keeping desired fine details and at the same time providing sufficiently large receptive fields is non trivial. This is even more so, when objects or classes present in an image significantly vary in size. Dilated convolutions have proven valuable for semantic segmentation, because they allow to increase the size of the receptive field without sacrificing image resolution. However, in current state-of-the-art methods, dilation parameters are hand-tuned and fixed. In this paper, we present an approach for learning dilation parameters adaptively per channel, consistently improving semantic segmentation results on street-scene datasets like Cityscapes and Camvid.