CLMay 2, 2025

MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning

arXiv:2505.01110v12 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses a bottleneck in scaling in-context learning for LLMs, particularly in resource-constrained settings, though it is incremental as it builds on existing context extension methods.

The paper tackles the problem of attention dispersion in large-scale in-context learning for large language models, where performance degrades as the number of demonstration examples increases, and introduces MateICL, which splits context into windows and recalibrates attention weights to maintain effectiveness, achieving better performance than retrieval-based baselines without external models.

Large Language Models (LLMs) have demonstrated remarkable capabilities in In-Context Learning (ICL). However, the fixed position length constraints in pre-trained models limit the number of demonstration examples. Recent efforts to extend context suffer from attention dispersion as the number of demonstrations increases. In this paper, we introduce Mitigating Attention Dispersion in large-scale ICL (MateICL) that enables LLMs to maintain effective self-attention as the context size grows. We first split the context into multiple windows, each filled to the model's context capacity, which are processed separately. Then, we introduce an additional layer to recalibrate the attention weights, prioritizing the query tokens as the number of demonstrations increases. Our empirical results show that MateICL can effectively leverage larger contexts to improve ICL performance. Compared to retrieval-based baselines, MateICL consistently achieves better performance without requiring an externally trained retrieval model. Despite recent advances in inference strategies (e.g., 32k token contexts), our results demonstrate that MateICL remains beneficial in computationally resource-constrained settings. The code is publicly available at https://github.com/amurtadha/MateICL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes