SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation
This addresses the problem of reducing annotation costs for image segmentation in computer vision, though it is incremental as it builds on existing self-supervised methods like SimSiam and DINO-ViT.
The paper tackles unsupervised image segmentation by introducing SimSAM, a framework that computes a Semantic Affinity Matrix using non-contrastive self-supervised learning, achieving competitive results on object and semantic segmentation tasks without annotations.
Recent developments in self-supervised learning (SSL) have made it possible to learn data representations without the need for annotations. Inspired by the non-contrastive SSL approach (SimSiam), we introduce a novel framework SIMSAM to compute the Semantic Affinity Matrix, which is significant for unsupervised image segmentation. Given an image, SIMSAM first extracts features using pre-trained DINO-ViT, then projects the features to predict the correlations of dense features in a non-contrastive way. We show applications of the Semantic Affinity Matrix in object segmentation and semantic segmentation tasks. Our code is available at https://github.com/chandagrover/SimSAM.