CVAIJan 23, 2024

ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation

arXiv:2401.12665v2105 citationsh-index: 13Neurocomputing
Originality Incremental advance
AI Analysis

This work addresses the challenge of precise anomaly segmentation without training data, which is important for industrial inspection and safety applications, but it is incremental as it builds on existing foundational models.

The paper tackles the problem of zero-shot anomaly segmentation by proposing ClipSAM, a framework that combines CLIP and SAM to overcome their individual limitations, achieving optimal segmentation performance on MVTec-AD and VisA datasets.

Recently, foundational models such as CLIP and SAM have shown promising performance for the task of Zero-Shot Anomaly Segmentation (ZSAS). However, either CLIP-based or SAM-based ZSAS methods still suffer from non-negligible key drawbacks: 1) CLIP primarily focuses on global feature alignment across different inputs, leading to imprecise segmentation of local anomalous parts; 2) SAM tends to generate numerous redundant masks without proper prompt constraints, resulting in complex post-processing requirements. In this work, we innovatively propose a CLIP and SAM collaboration framework called ClipSAM for ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation, which is further used as the prompt constraints for SAM to refine the anomaly segmentation results. In details, we introduce a crucial Unified Multi-scale Cross-modal Interaction (UMCI) module for interacting language with visual features at multiple scales of CLIP to reason anomaly positions. Then, we design a novel Multi-level Mask Refinement (MMR) module, which utilizes the positional information as multi-level prompts for SAM to acquire hierarchical levels of masks and merges them. Extensive experiments validate the effectiveness of our approach, achieving the optimal segmentation performance on the MVTec-AD and VisA datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes