Submodular Context Partitioning and Compression for In-Context Learning
This addresses efficiency bottlenecks in few-shot learning for LLM users, though it appears incremental as it builds on existing partitioning approaches.
The paper tackles the problem of quadratic input complexity limiting exemplar count in in-context learning for large language models by proposing Sub-CP, a block-aware context selection framework using submodular objectives to control block diversity, which consistently improved performance across model scales in experiments on diverse tasks and datasets.
In-context learning (ICL) enables efficient few-shot learning in large language models (LLMs) without training, but suffers from the quadratic input complexity of transformers, limiting the maximum number of exemplars. While various efficient ICL approaches partition the context into blocks to process (e.g., ensembling, compression, cross-attention), they often ignore the information redundancy or under-representation caused by different partition strategies, leading to suboptimal performance. To tackle this problem, we propose Sub-CP, a block-aware context selection framework that leverages submodular objectives to control block diversity. Sub-CP supports a flexible spectrum of selection strategies, allowing each block to range from globally diverse to locally coherent. This allows fine-grained control over semantic structure while enabling precomputation. Extensive experiments across diverse tasks on multiple datasets show that Sub-CP consistently improves performance across model scales.