Unsupervised Discovery of the Long-Tail in Instance Segmentation Using Hierarchical Self-Supervision
This addresses the high cost and limited generalization of supervised instance segmentation for new domains, though it is incremental as it builds on existing self-supervised techniques.
The paper tackles the problem of expensive and non-generalizable supervised instance segmentation by proposing an unsupervised method that discovers long-tail categories using hierarchical self-supervision, achieving competitive results on LVIS compared to supervised approaches.
Instance segmentation is an active topic in computer vision that is usually solved by using supervised learning approaches over very large datasets composed of object level masks. Obtaining such a dataset for any new domain can be very expensive and time-consuming. In addition, models trained on certain annotated categories do not generalize well to unseen objects. The goal of this paper is to propose a method that can perform unsupervised discovery of long-tail categories in instance segmentation, through learning instance embeddings of masked regions. Leveraging rich relationship and hierarchical structure between objects in the images, we propose self-supervised losses for learning mask embeddings. Trained on COCO dataset without additional annotations of the long-tail objects, our model is able to discover novel and more fine-grained objects than the common categories in COCO. We show that the model achieves competitive quantitative results on LVIS as compared to the supervised and partially supervised methods.