Mixtures of In-Context Learners
This work addresses the problem of memory exhaustion and inefficiency in in-context learning for LLMs, offering a more expressive approach that improves performance and robustness, though it is incremental as it builds on existing ICL methods.
The paper tackles the inefficiency and memory issues of in-context learning (ICL) in LLMs by proposing Mixtures of In-Context Learners (MoICL), which treats subsets of demonstrations as experts and learns a weighting function to merge their outputs, resulting in performance improvements on 5 out of 7 classification datasets (up to +13% compared to ICL and LENS) and enhanced robustness to out-of-domain, imbalanced, or noisy demonstrations.
In-context learning (ICL) adapts LLMs by providing demonstrations without fine-tuning the model parameters; however, it does not differentiate between demonstrations and quadratically increases the complexity of Transformer LLMs, exhausting the memory. As a solution, we propose Mixtures of In-Context Learners (MoICL), a novel approach to treat subsets of demonstrations as experts and learn a weighting function to merge their output distributions based on a training set. In our experiments, we show performance improvements on 5 out of 7 classification datasets compared to a set of strong baselines (up to +13\% compared to ICL and LENS). Moreover, we enhance the Pareto frontier of ICL by reducing the inference time needed to achieve the same performance with fewer demonstrations. Finally, MoICL is more robust to out-of-domain (up to +11\%), imbalanced (up to +49\%), or noisy demonstrations (up to +38\%) or can filter these out from datasets. Overall, MoICL is a more expressive approach to learning from demonstrations without exhausting the context window or memory.