Improving Fully Convolution Network for Semantic Segmentation
This work addresses the problem of improving semantic segmentation accuracy for computer vision applications, representing an incremental advancement over existing FCN methods.
The paper tackles the limitations of Fully Convolution Networks (FCN) for semantic segmentation by introducing an Improved Fully Convolution Network (IFCN) with a context network and dense skip connections, achieving state-of-the-art results on multiple datasets like ADE20K, Pascal Context, Pascal VOC 2012, and SUN-RGBD.
Fully Convolution Networks (FCN) have achieved great success in dense prediction tasks including semantic segmentation. In this paper, we start from discussing FCN by understanding its architecture limitations in building a strong segmentation network. Next, we present our Improved Fully Convolution Network (IFCN). In contrast to FCN, IFCN introduces a context network that progressively expands the receptive fields of feature maps. In addition, dense skip connections are added so that the context network can be effectively optimized. More importantly, these dense skip connections enable IFCN to fuse rich-scale context to make reliable predictions. Empirically, those architecture modifications are proven to be significant to enhance the segmentation performance. Without engaging any contextual post-processing, IFCN significantly advances the state-of-the-arts on ADE20K (ImageNet scene parsing), Pascal Context, Pascal VOC 2012 and SUN-RGBD segmentation datasets.