CL LGFeb 6, 2025

Exploring Imbalanced Annotations for Effective In-Context Learning

Hongfu Gao, Feipeng Zhang, Hao Zeng, Deyu Meng, Bingyi Jing, Hongxin Wei

arXiv:2502.04037v24.91 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses performance degradation in in-context learning for AI practitioners due to imbalanced data, offering an incremental enhancement to existing selection methods.

The paper tackles the problem of class imbalance in annotated datasets degrading in-context learning performance for large language models, and proposes a reweighting method that improves average accuracy by up to 5.42%.

Large language models (LLMs) have shown impressive performance on downstream tasks through in-context learning (ICL), which heavily relies on the demonstrations selected from annotated datasets. However, these datasets often exhibit long-tailed class distributions in real-world scenarios, leading to biased demonstration selection. In this work, we show that such class imbalances significantly degrade the ICL performance across various tasks, regardless of selection methods. Moreover, classical rebalancing methods, which focus solely on class weights, yield poor performance due to neglecting condition bias--skewed feature distributions within classes. To address this, we propose Reweighting with Conditional Bias (dubbed RCB), a simple and complementary approach to enhance ICL performance under class imbalance. In particular, RCB estimates conditional bias using a balanced subset and re-weights demonstration scores based on both class weight and conditional bias. In effect, RCB prevents over-selection from dominant classes while preserving the efficacy of current selection methods. Extensive experiments on common benchmarks demonstrate the effectiveness of our method, improving the average accuracy of current selection methods by up to 5.42%.

View on arXiv PDF

Similar