CVMar 28, 2025

VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection

arXiv:2503.22291v11 citationsh-index: 11ICASSP
Originality Incremental advance
AI Analysis

This addresses reliability issues for object detectors deployed as black-box services, but it is incremental as it adapts existing CLIP-based approaches to a specific task.

The paper tackles zero-shot object-level out-of-distribution detection for object detectors in open-world settings by introducing a method that uses visual prompts and text-augmented in-distribution space construction with CLIP, achieving competitive performance across benchmarks.

As object detectors are increasingly deployed as black-box cloud services or pre-trained models with restricted access to the original training data, the challenge of zero-shot object-level out-of-distribution (OOD) detection arises. This task becomes crucial in ensuring the reliability of detectors in open-world settings. While existing methods have demonstrated success in image-level OOD detection using pre-trained vision-language models like CLIP, directly applying such models to object-level OOD detection presents challenges due to the loss of contextual information and reliance on image-level alignment. To tackle these challenges, we introduce a new method that leverages visual prompts and text-augmented in-distribution (ID) space construction to adapt CLIP for zero-shot object-level OOD detection. Our method preserves critical contextual information and improves the ability to differentiate between ID and OOD objects, achieving competitive performance across different benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes