V-SYNTHESIS: Task-Agnostic Synthesis of Consistent and Diverse In-Context Demonstrations from Scratch via V-Entropy
This addresses the high labeling cost for in-context learning demonstrations, offering a task-agnostic solution that is incremental over prior task-specific or demonstration-dependent methods.
The paper tackles the problem of synthesizing in-context learning demonstrations from scratch for arbitrary tasks, which is challenging due to potential consistency issues without labeling guidance. The result is V-Synthesis, a method that uses a novel consistency metric (V-Score) and proportional sampling, achieving an average performance improvement of 2.0% over existing synthesis methods.
High labeling cost for in-context learning (ICL) demonstrations motivates using large language models (LLMs) for synthesis to reduce overhead. However, existing synthesis methods are mainly task-specific or rely on pre-existing demonstrations. So this paper focuses on synthesizing demonstrations from scratch for arbitrary tasks. A major challenge in synthesizing from scratch is ensuring consistency with the target task, as the lack of labeling guidance could lead to synthesis bias. We first propose a consistency metric called V-Score, which has higher performance and lower computation cost compared with the metrics based on grams or embedding vectors. Furthermore, we introduce V-Synthesis, which leverages V-Score for proportional sampling to ensure both high consistency and diversity of synthesized demonstrations. Experimental results demonstrate that V-Synthesis yields an average performance improvement of 2.0% compared to existing synthesis methods confirming the effectiveness of V-Synthesis.