AICVJan 15, 2025

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

arXiv:2501.08841v14 citationsh-index: 3AAAI
Originality Incremental advance
AI Analysis

This work addresses the deployment bottleneck of Visual In-Context Learning for researchers and practitioners by reducing computational overhead, though it is incremental as it builds on existing prompting methods.

The paper tackles the high computational cost of finding optimal prompts for every test sample in Visual In-Context Learning by discovering that most test samples achieve optimal performance under the same prompts, and proposes task-level prompting with two search strategies to reduce inference costs. Experimental results show the method identifies near-optimal prompts and achieves the best VICL performance with minimal cost, surpassing prior work.

With the development of Vision Foundation Models (VFMs) in recent years, Visual In-Context Learning (VICL) has become a better choice compared to modifying models in most scenarios. Different from retraining or fine-tuning model, VICL does not require modifications to the model's weights or architecture, and only needs a prompt with demonstrations to teach VFM how to solve tasks. Currently, significant computational cost for finding optimal prompts for every test sample hinders the deployment of VICL, as determining which demonstrations to use for constructing prompts is very costly. In this paper, however, we find a counterintuitive phenomenon that most test samples actually achieve optimal performance under the same prompts, and searching for sample-level prompts only costs more time but results in completely identical prompts. Therefore, we propose task-level prompting to reduce the cost of searching for prompts during the inference stage and introduce two time-saving yet effective task-level prompt search strategies. Extensive experimental results show that our proposed method can identify near-optimal prompts and reach the best VICL performance with a minimal cost that prior work has never achieved.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes