CL AIOct 27, 2025

Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures

Shenran Wang, Timothy Tin-Long Tse, Jian Zhu

arXiv:2510.23006v1h-index: 1

Originality Synthesis-oriented

AI Analysis

This provides a nuanced understanding of in-context learning mechanisms across architectures, which is incremental for researchers in machine learning and AI.

The study investigated in-context learning across transformer, state-space, and hybrid large language models on knowledge-based tasks, finding that while performance is similar, internal mechanisms differ, with function vectors primarily in self-attention and Mamba layers and varying importance for parametric versus contextual knowledge.

We perform in-depth evaluations of in-context learning (ICL) on state-of-the-art transformer, state-space, and hybrid large language models over two categories of knowledge-based ICL tasks. Using a combination of behavioral probing and intervention-based methods, we have discovered that, while LLMs of different architectures can behave similarly in task performance, their internals could remain different. We discover that function vectors (FVs) responsible for ICL are primarily located in the self-attention and Mamba layers, and speculate that Mamba2 uses a different mechanism from FVs to perform ICL. FVs are more important for ICL involving parametric knowledge retrieval, but not for contextual knowledge understanding. Our work contributes to a more nuanced understanding across architectures and task types. Methodologically, our approach also highlights the importance of combining both behavioural and mechanistic analyses to investigate LLM capabilities.

View on arXiv PDF

Similar