CVAILGMar 10, 2023

Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection

arXiv:2303.05828v23 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses improving OOD detection robustness for vision-language models, offering incremental but practical gains for safety-critical applications like autonomous systems.

The paper tackles adapting CLIP models for out-of-distribution (OOD) detection by proposing pseudo-label probing (PLP), which outperforms previous state-of-the-art methods by an average AUROC gain of 3.4% on ImageNet benchmarks and shows linear probing beats fine-tuning by 7.3% AUROC on average.

We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection, focusing on adapting contrastive language-image pretrained (CLIP) models. Without fine-tuning on the training data, we are able to establish a positive correlation ($R^2\geq0.92$) between in-distribution classification and unsupervised OOD detection for CLIP models in $4$ benchmarks. We further propose a new simple and scalable method called \textit{pseudo-label probing} (PLP) that adapts vision-language models for OOD detection. Given a set of label names of the training set, PLP trains a linear layer using the pseudo-labels derived from the text encoder of CLIP. To test the OOD detection robustness of pretrained models, we develop a novel feature-based adversarial OOD data manipulation approach to create adversarial samples. Intriguingly, we show that (i) PLP outperforms the previous state-of-the-art \citep{ming2022mcm} on all $5$ large-scale benchmarks based on ImageNet, specifically by an average AUROC gain of 3.4\% using the largest CLIP model (ViT-G), (ii) we show that linear probing outperforms fine-tuning by large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of 7.3\% AUROC on average on all ImageNet-based benchmarks), and (iii) billion-parameter CLIP models still fail at detecting adversarially manipulated OOD images. The code and adversarially created datasets will be made publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes