CV AIMay 12, 2024

Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)

Saaketh Koundinya Gundavarapu, Arushi Arora, Shreya Agarwal

arXiv:2405.07284v14 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for computer vision researchers, enhancing SAM's versatility for context-aware segmentation.

The paper tackles zero-shot object segmentation by combining SAM and CLIP into SLIP, enabling segmentation based on text prompts without prior training on specific classes, and demonstrates its effectiveness in experiments.

We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the capabilities of the original architecture and enables more versatile and context-aware object segmentation.

View on arXiv PDF

Similar