Semantic Segmentation with Reverse Attention
This work addresses semantic segmentation for computer vision applications, offering incremental improvements over existing methods.
The paper tackles semantic segmentation by proposing a reverse attention network (RAN) that learns both representative features and opposite concepts for target classes, achieving a state-of-the-art mIoU score of 48.1% on the PASCAL-Context dataset and improvements on other datasets.
Recent development in fully convolutional neural network enables efficient end-to-end learning of semantic segmentation. Traditionally, the convolutional classifiers are taught to learn the representative semantic features of labeled semantic objects. In this work, we propose a reverse attention network (RAN) architecture that trains the network to capture the opposite concept (i.e., what are not associated with a target class) as well. The RAN is a three-branch network that performs the direct, reverse and reverse-attention learning processes simultaneously. Extensive experiments are conducted to show the effectiveness of the RAN in semantic segmentation. Being built upon the DeepLabv2-LargeFOV, the RAN achieves the state-of-the-art mIoU score (48.1%) for the challenging PASCAL-Context dataset. Significant performance improvements are also observed for the PASCAL-VOC, Person-Part, NYUDv2 and ADE20K datasets.