CVDec 5, 2024

CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP

Zuo Zuo, Jiahao Dong, Yao Wu, Yanyun Qu, Zongze Wu

arXiv:2412.03829v12.02 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of data scarcity in industrial anomaly classification for manufacturing quality control, representing an incremental improvement over existing methods.

The paper tackles few-shot anomaly classification in industrial manufacturing by proposing CLIP-FSAC++, a framework that uses an anomaly descriptor module to enhance CLIP's cross-modality embeddings, achieving improved performance on VisA and MVTEC-AD datasets across 1, 2, 4, and 8-shot settings.

Industrial anomaly classification (AC) is an indispensable task in industrial manufacturing, which guarantees quality and safety of various product. To address the scarcity of data in industrial scenarios, lots of few-shot anomaly detection methods emerge recently. In this paper, we propose an effective few-shot anomaly classification (FSAC) framework with one-stage training, dubbed CLIP-FSAC++. Specifically, we introduce a cross-modality interaction module named Anomaly Descriptor following image and text encoders, which enhances the correlation of visual and text embeddings and adapts the representations of CLIP from pre-trained data to target data. In anomaly descriptor, image-to-text cross-attention module is used to obtain image-specific text embeddings and text-to-image cross-attention module is used to obtain text-specific visual embeddings. Then these modality-specific embeddings are used to enhance original representations of CLIP for better matching ability. Comprehensive experiment results are provided for evaluating our method in few-normal shot anomaly classification on VisA and MVTEC-AD for 1, 2, 4 and 8-shot settings. The source codes are at https://github.com/Jay-zzcoder/clip-fsac-pp

View on arXiv PDF Code

Similar