CVSep 25, 2024

VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection

Liangyu Zhong, Joachim Sicking, Fabian Hüger, Hanno Gottschalk

arXiv:2409.17330v12.01 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses the challenge of detecting unknown anomalies in segmentation for applications like autonomous driving or surveillance, offering a data- and training-free approach that is incremental over existing methods.

The paper tackles the problem of anomaly detection in semantic segmentation by incorporating vision-language models to improve outlier awareness without fine-tuning on outlier samples, achieving competitive performance on benchmark datasets.

Semantic segmentation networks have achieved significant success under the assumption of independent and identically distributed data. However, these networks often struggle to detect anomalies from unknown semantic classes due to the limited set of visual concepts they are typically trained on. To address this issue, anomaly segmentation often involves fine-tuning on outlier samples, necessitating additional efforts for data collection, labeling, and model retraining. Seeking to avoid this cumbersome work, we take a different approach and propose to incorporate Vision-Language (VL) encoders into existing anomaly detectors to leverage the semantically broad VL pre-training for improved outlier awareness. Additionally, we propose a new scoring function that enables data- and training-free outlier supervision via textual prompts. The resulting VL4AD model, which includes max-logit prompt ensembling and a class-merging strategy, achieves competitive performance on widely used benchmark datasets, thereby demonstrating the potential of vision-language models for pixel-wise anomaly detection.

View on arXiv PDF

Similar