CVSep 26, 2025

PANICL: Mitigating Over-Reliance on Single Prompt in Visual In-Context Learning

arXiv:2509.21926v12 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses a key limitation in visual in-context learning for vision tasks, offering a versatile solution that enhances robustness and generalization, though it is incremental as it builds on existing VICL models.

The paper tackles the problem of over-reliance on single prompts in visual in-context learning, which causes biased and unstable predictions, by introducing PANICL, a training-free framework that uses multiple in-context pairs to smooth assignment scores, resulting in consistent improvements across tasks like segmentation and detection.

Visual In-Context Learning (VICL) uses input-output image pairs, referred to as in-context pairs (or examples), as prompts alongside query images to guide models in performing diverse vision tasks. However, VICL often suffers from over-reliance on a single in-context pair, which can lead to biased and unstable predictions. We introduce PAtch-based $k$-Nearest neighbor visual In-Context Learning (PANICL), a general training-free framework that mitigates this issue by leveraging multiple in-context pairs. PANICL smooths assignment scores across pairs, reducing bias without requiring additional training. Extensive experiments on a variety of tasks, including foreground segmentation, single object detection, colorization, multi-object segmentation, and keypoint detection, demonstrate consistent improvements over strong baselines. Moreover, PANICL exhibits strong robustness to domain shifts, including dataset-level shift (e.g., from COCO to Pascal) and label-space shift (e.g., FSS-1000), and generalizes well to other VICL models such as SegGPT, Painter, and LVM, highlighting its versatility and broad applicability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes