Instance-aware Semantic Segmentation via Multi-task Network Cascades
This work addresses the limitation of many semantic segmentation methods in identifying object instances, providing a faster and more accurate solution for computer vision applications.
The paper tackles the problem of instance-aware semantic segmentation, which identifies individual object instances, by proposing Multi-task Network Cascades that differentiate instances, estimate masks, and categorize objects in a shared feature structure, achieving state-of-the-art accuracy on PASCAL VOC with a testing time of 360ms per image using VGG-16, which is two orders of magnitude faster than previous systems.
Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multi-task Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, and are designed to share their convolutional features. We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Our solution is a clean, single-step training framework and can be generalized to cascades that have more stages. We demonstrate state-of-the-art instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our method takes only 360ms testing an image using VGG-16, which is two orders of magnitude faster than previous systems for this challenging problem. As a by product, our method also achieves compelling object detection results which surpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions to the MS COCO 2015 segmentation competition, where we won the 1st place.