CVMay 8, 2024

Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection

arXiv:2405.04782v17 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the problem of anomaly detection in computer vision for applications requiring zero-shot capabilities, representing an incremental improvement by enhancing CLIP-based methods with visual references.

The paper tackles zero-shot anomaly detection by introducing a Dual-Image Enhanced CLIP approach that uses pairs of images as visual references for each other, improving both anomaly classification and localization performances, with results showing comparable performance to state-of-the-art methods across various datasets.

Image Anomaly Detection has been a challenging task in Computer Vision field. The advent of Vision-Language models, particularly the rise of CLIP-based frameworks, has opened new avenues for zero-shot anomaly detection. Recent studies have explored the use of CLIP by aligning images with normal and prompt descriptions. However, the exclusive dependence on textual guidance often falls short, highlighting the critical importance of additional visual references. In this work, we introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system. Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context. This dual-image strategy markedly enhanced both anomaly classification and localization performances. Furthermore, we have strengthened our model with a test-time adaptation module that incorporates synthesized anomalies to refine localization capabilities. Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes