CV AI LGAug 11, 2025

ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection

Ke Ma, Jun Long, Hongxiao Fei, Liujie Hua, Zhen Dai, Yueyi Luo

arXiv:2508.07819v53.6h-index: 43Has Code

Originality Incremental advance

AI Analysis

This work improves anomaly detection for industrial and medical applications by enhancing VLMs for dense perception tasks, though it appears incremental as it builds on existing adaptation methods.

The paper tackled the problem of Zero-Shot Anomaly Detection (ZSAD) by addressing the adaptation gap in pre-trained Vision-Language Models (VLMs) through an Architectural Co-Design framework, resulting in superior accuracy and robustness on diverse industrial and medical benchmarks.

Pre-trained Vision-Language Models (VLMs) struggle with Zero-Shot Anomaly Detection (ZSAD) due to a critical adaptation gap: they lack the local inductive biases required for dense prediction and employ inflexible feature fusion paradigms. We address these limitations through an Architectural Co-Design framework that jointly refines feature representation and cross-modal fusion. Our method proposes a parameter-efficient Convolutional Low-Rank Adaptation (Conv-LoRA) adapter to inject local inductive biases for fine-grained representation, and introduces a Dynamic Fusion Gateway (DFG) that leverages visual context to adaptively modulate text prompts, enabling a powerful bidirectional fusion. Extensive experiments on diverse industrial and medical benchmarks demonstrate superior accuracy and robustness, validating that this synergistic co-design is critical for robustly adapting foundation models to dense perception tasks. The source code is available at https://github.com/cockmake/ACD-CLIP.

View on arXiv PDF Code

Similar