Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning
This addresses a key limitation in ICL for LLMs, making it more robust and efficient for practical applications, though it is an incremental improvement over existing methods.
The paper tackles the sensitivity of large language models (LLMs) to the order of in-context learning (ICL) examples by developing Batch-ICL, an inference algorithm that uses separate 1-shot forward computations and aggregated meta-gradients to make predictions order-agnostic, resulting in consistent outperformance over most permutations and sometimes exceeding the best order in standard ICL while reducing computational resources.
In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach renders the LLM agnostic to the order of ICL examples. Through extensive experiments and analysis, we demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples. In some cases, it even exceeds the performance of the best order for standard ICL, all while reducing the computational resources required. Furthermore, we develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization. This variant implicitly explores permutations of ICL examples, further enhancing ICL performance.