Towards Generalizable Implicit In-Context Learning with Attention Routing
This addresses the challenge of making implicit ICL more practical and generalizable for users of large language models, though it appears incremental as it builds on existing implicit ICL paradigms.
The paper tackles the problem of limited generalizability in implicit in-context learning (ICL) methods, which aim to achieve few-shot performance at zero-shot cost, by proposing In-Context Routing (ICR) that internalizes reusable structural patterns at the attention logits level. The results show that ICR consistently outperforms prior methods on 12 real-world datasets and demonstrates robust generalization to out-of-domain tasks.
Implicit in-context learning (ICL) has newly emerged as a promising paradigm that simulates ICL behaviors in the representation space of Large Language Models (LLMs), aiming to attain few-shot performance at zero-shot cost. However, existing approaches largely rely on injecting shift vectors into residual flows, which are typically constructed from labeled demonstrations or task-specific alignment. Such designs fall short of utilizing the structural mechanisms underlying ICL and suffer from limited generalizability. To address this, we propose In-Context Routing (ICR), a novel implicit ICL method that internalizes generalizable ICL patterns at the attention logits level. It extracts reusable structural directions that emerge during ICL and employs a learnable input-conditioned router to modulate attention logits accordingly, enabling a train-once-and-reuse framework. We evaluate ICR on 12 real-world datasets spanning diverse domains and multiple LLMs. The results show that ICR consistently outperforms prior implicit ICL methods that require task-specific retrieval or training, while demonstrating robust generalization to out-of-domain tasks where existing methods struggle. These findings position ICR to push the boundary of ICL's practical value.