Distance transform regression for spatially-aware deep semantic segmentation
This work addresses a common issue in semantic segmentation for computer vision applications, offering an incremental improvement with low computational overhead.
The paper tackles the problem of blurry boundaries and ill-segmented shapes in deep semantic segmentation by introducing a regularization technique based on distance transform regression, resulting in significant improvements over competitive baselines across various datasets and architectures.
Understanding visual scenes relies more and more on dense pixel-wise classification obtained via deep fully convolutional neural networks. However, due to the nature of the networks, predictions often suffer from blurry boundaries and ill-segmented shapes, fueling the need for post-processing. This work introduces a new semantic segmentation regularization based on the regression of a distance transform. After computing the distance transform on the label masks, we train a FCN in a multi-task setting in both discrete and continuous spaces by learning jointly classification and distance regression. This requires almost no modification of the network structure and adds a very low overhead to the training process. Learning to approximate the distance transform back-propagates spatial cues that implicitly regularizes the segmentation. We validate this technique with several architectures on various datasets, and we show significant improvements compared to competitive baselines.