CVNov 14, 2025

Out-of-Distribution Detection with Positive and Negative Prompt Supervision Using Large Language Models

Zhixia He, Chen Zhao, Minglai Shao, Xintao Wu, Xujiang Zhao, Dong Li, Qin Tian, Linlin Yu

arXiv:2511.10923v13.6h-index: 5

Originality Incremental advance

AI Analysis

This work addresses the problem of improving OOD detection accuracy for computer vision applications, representing an incremental advancement over existing vision-language model methods.

The paper tackles out-of-distribution detection by proposing Positive and Negative Prompt Supervision, which uses large language models to optimize prompts that capture inter-class features and enhance visual modality performance, achieving state-of-the-art results on benchmarks like CIFAR-100 and ImageNet-1K across eight OOD datasets.

Out-of-distribution (OOD) detection is committed to delineating the classification boundaries between in-distribution (ID) and OOD images. Recent advances in vision-language models (VLMs) have demonstrated remarkable OOD detection performance by integrating both visual and textual modalities. In this context, negative prompts are introduced to emphasize the dissimilarity between image features and prompt content. However, these prompts often include a broad range of non-ID features, which may result in suboptimal outcomes due to the capture of overlapping or misleading information. To address this issue, we propose Positive and Negative Prompt Supervision, which encourages negative prompts to capture inter-class features and transfers this semantic knowledge to the visual modality to enhance OOD detection performance. Our method begins with class-specific positive and negative prompts initialized by large language models (LLMs). These prompts are subsequently optimized, with positive prompts focusing on features within each class, while negative prompts highlight features around category boundaries. Additionally, a graph-based architecture is employed to aggregate semantic-aware supervision from the optimized prompt representations and propagate it to the visual branch, thereby enhancing the performance of the energy-based OOD detector. Extensive experiments on two benchmarks, CIFAR-100 and ImageNet-1K, across eight OOD datasets and five different LLMs, demonstrate that our method outperforms state-of-the-art baselines.

View on arXiv PDF

Similar