CVSep 26, 2025

PANICL: Mitigating Over-Reliance on Single Prompt in Visual In-Context Learning

Jiahao Zhang, Bowen Wang, Hong Liu, Yuta Nakashima, Hajime Nagahara

arXiv:2509.21926v12 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses a key limitation in visual in-context learning for vision tasks, offering a versatile solution that enhances robustness and generalization, though it is incremental as it builds on existing VICL models.

The paper tackles the problem of over-reliance on single prompts in visual in-context learning, which causes biased and unstable predictions, by introducing PANICL, a training-free framework that uses multiple in-context pairs to smooth assignment scores, resulting in consistent improvements across tasks like segmentation and detection.

Visual In-Context Learning (VICL) uses input-output image pairs, referred to as in-context pairs (or examples), as prompts alongside query images to guide models in performing diverse vision tasks. However, VICL often suffers from over-reliance on a single in-context pair, which can lead to biased and unstable predictions. We introduce PAtch-based $k$-Nearest neighbor visual In-Context Learning (PANICL), a general training-free framework that mitigates this issue by leveraging multiple in-context pairs. PANICL smooths assignment scores across pairs, reducing bias without requiring additional training. Extensive experiments on a variety of tasks, including foreground segmentation, single object detection, colorization, multi-object segmentation, and keypoint detection, demonstrate consistent improvements over strong baselines. Moreover, PANICL exhibits strong robustness to domain shifts, including dataset-level shift (e.g., from COCO to Pascal) and label-space shift (e.g., FSS-1000), and generalizes well to other VICL models such as SegGPT, Painter, and LVM, highlighting its versatility and broad applicability.

View on arXiv PDF

Similar