LGAICLJul 8, 2024

Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation

arXiv:2407.05693v24 citationsh-index: 5
AI Analysis

This addresses the annotation bottleneck for researchers and practitioners using in-context learning, but it is incremental as it builds on existing selective annotation methods.

The paper tackles the high annotation cost of selecting in-context examples for large language models by proposing Sub-SA, a submodular selective annotation method that reduces annotation costs and improves example quality, achieving up to 15% performance gain on benchmarks.

In-context learning (ICL) leverages in-context examples as prompts for the predictions of Large Language Models (LLMs). These prompts play a crucial role in achieving strong performance. However, the selection of suitable prompts from a large pool of labeled examples often entails significant annotation costs. To address this challenge, we propose Sub-SA (Submodular Selective Annotation), a submodule-based selective annotation method. The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples and minimizing the time consumption of the selection process. In Sub-SA, we design a submodular function that facilitates effective subset selection for annotation and demonstrates the characteristics of monotonically and submodularity from the theoretical perspective. Specifically, we propose RPR (Reward and Penalty Regularization) to better balance the diversity and representativeness of the unlabeled dataset attributed to a reward term and a penalty term, respectively. Consequently, the selection for annotations can be effectively addressed with a simple yet effective greedy search algorithm based on the submodular function. Finally, we apply the similarity prompt retrieval to get the examples for ICL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes