LG AI CLJun 20, 2025

Latent Concept Disentanglement in Transformer-based Language Models

Guan Zhe Hong, Bhavya Vasudeva, Vatsal Sharan, Cyrus Rashtchian, Prabhakar Raghavan, Rina Panigrahy

arXiv:2506.16975v23 citationsh-index: 19

Originality Incremental advance

AI Analysis

This provides mechanistic insights into how transformers handle latent structures, which is incremental but relevant for interpretability in AI.

The study investigated whether transformer-based language models can infer and represent latent concepts from in-context demonstrations, showing that models successfully identify discrete concepts for step-by-step reasoning and reveal low-dimensional subspaces for numerical concepts.

When large language models (LLMs) use in-context learning (ICL) to solve a new task, they must infer latent concepts from demonstration examples. This raises the question of whether and how transformers represent latent structures as part of their computation. Our work experiments with several controlled tasks, studying this question using mechanistic interpretability. First, we show that in transitive reasoning tasks with a latent, discrete concept, the model successfully identifies the latent concept and does step-by-step concept composition. This builds upon prior work that analyzes single-step reasoning. Then, we consider tasks parameterized by a latent numerical concept. We discover low-dimensional subspaces in the model's representation space, where the geometry cleanly reflects the underlying parameterization. Overall, we show that small and large models can indeed disentangle and utilize latent concepts that they learn in-context from a handful of abbreviated demonstrations.

View on arXiv PDF

Similar