Evidential fully convolutional network for semantic segmentation
This work addresses the problem of ambiguous pixel classification in semantic segmentation for computer vision applications, representing an incremental improvement.
The paper tackled semantic segmentation by proposing a hybrid architecture combining a fully convolutional network with a Dempster-Shafer layer to handle ambiguous pixels and outliers, resulting in improved accuracy and calibration across three databases (Pascal VOC 2011, MIT-scene Parsing, and SIFT Flow).
We propose a hybrid architecture composed of a fully convolutional network (FCN) and a Dempster-Shafer layer for image semantic segmentation. In the so-called evidential FCN (E-FCN), an encoder-decoder architecture first extracts pixel-wise feature maps from an input image. A Dempster-Shafer layer then computes mass functions at each pixel location based on distances to prototypes. Finally, a utility layer performs semantic segmentation from mass functions and allows for imprecise classification of ambiguous pixels and outliers. We propose an end-to-end learning strategy for jointly updating the network parameters, which can make use of soft (imprecise) labels. Experiments using three databases (Pascal VOC 2011, MIT-scene Parsing and SIFT Flow) show that the proposed combination improves the accuracy and calibration of semantic segmentation by assigning confusing pixels to multi-class sets.