RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
This work addresses the need for more accurate instance segmentation in computer vision, particularly for applications requiring precise object boundaries, and is incremental as it builds upon existing two-stage methods.
The paper tackles the problem of coarse masks in two-stage instance segmentation methods like Mask R-CNN by proposing RefineMask, which incorporates fine-grained features in a multi-stage manner to refine high-quality masks, achieving gains of 2.6 to 3.8 AP over Mask R-CNN on benchmarks like COCO and establishing a new state-of-the-art on LVIS.
The two-stage methods for instance segmentation, e.g. Mask R-CNN, have achieved excellent performance recently. However, the segmented masks are still very coarse due to the downsampling operations in both the feature pyramid and the instance-wise pooling process, especially for large objects. In this work, we propose a new method called RefineMask for high-quality instance segmentation of objects and scenes, which incorporates fine-grained features during the instance-wise segmenting process in a multi-stage manner. Through fusing more detailed information stage by stage, RefineMask is able to refine high-quality masks consistently. RefineMask succeeds in segmenting hard cases such as bent parts of objects that are over-smoothed by most previous methods and outputs accurate boundaries. Without bells and whistles, RefineMask yields significant gains of 2.6, 3.4, 3.8 AP over Mask R-CNN on COCO, LVIS, and Cityscapes benchmarks respectively at a small amount of additional computational cost. Furthermore, our single-model result outperforms the winner of the LVIS Challenge 2020 by 1.3 points on the LVIS test-dev set and establishes a new state-of-the-art. Code will be available at https://github.com/zhanggang001/RefineMask.