CVAIMay 13, 2024

Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation

arXiv:2405.07969v13 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of ensuring reliable anomaly segmentation for users of pre-trained models under distribution shifts, but it is incremental as it focuses on evaluating an existing method.

The paper investigated the robustness of CLIP-based zero-shot anomaly segmentation to semantic transformations like rotations and color shifts, finding that performance dropped by up to 20% in ROC AUC and 40% in per-region overlap.

Zero-shot anomaly segmentation using pre-trained foundation models is a promising approach that enables effective algorithms without expensive, domain-specific training or fine-tuning. Ensuring that these methods work across various environmental conditions and are robust to distribution shifts is an open problem. We investigate the performance of WinCLIP [14] zero-shot anomaly segmentation algorithm by perturbing test data using three semantic transformations: bounded angular rotations, bounded saturation shifts, and hue shifts. We empirically measure a lower performance bound by aggregating across per-sample worst-case perturbations and find that average performance drops by up to 20% in area under the ROC curve and 40% in area under the per-region overlap curve. We find that performance is consistently lowered on three CLIP backbones, regardless of model architecture or learning objective, demonstrating a need for careful performance evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes