One-Shot Instance Segmentation
This work addresses the problem of flexible scene analysis for computer vision researchers, providing a first baseline for one-shot instance segmentation, though it is incremental as it builds on existing Mask R-CNN.
The paper tackles one-shot instance segmentation, where a model must segment all instances of a novel object category in a scene using only one example image, by proposing Siamese Mask R-CNN, which extends Mask R-CNN with a Siamese backbone to encode reference and scene images, achieving strong baseline results on MS Coco with noted challenges in detection targeting.
We tackle the problem of one-shot instance segmentation: Given an example image of a novel, previously unknown object category, find and segment all objects of this category within a complex scene. To address this challenging new task, we propose Siamese Mask R-CNN. It extends Mask R-CNN by a Siamese backbone encoding both reference image and scene, allowing it to target detection and segmentation towards the reference category. We demonstrate empirical results on MS Coco highlighting challenges of the one-shot setting: while transferring knowledge about instance segmentation to novel object categories works very well, targeting the detection network towards the reference category appears to be more difficult. Our work provides a first strong baseline for one-shot instance segmentation and will hopefully inspire further research into more powerful and flexible scene analysis algorithms. Code is available at: https://github.com/bethgelab/siamese-mask-rcnn