LG AI CL MLJun 12, 2025

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

Jikai Jin, Vasilis Syrgkanis, Sham Kakade, Hanlin Zhang

Stanford

arXiv:2506.10378v111.43 citationsh-index: 96Has Code

Originality Incremental advance

AI Analysis

This provides a more interpretable evaluation method for language model developers, though it is incremental in applying causal methods to existing benchmarks.

The authors tackled the problem of evaluating language model capabilities by developing a causal representation learning framework that models benchmark performance as a linear transformation of latent capability factors, identifying a three-node linear causal structure from over 1500 models across six benchmarks that reveals a causal progression from general problem-solving to instruction-following to mathematical reasoning.

Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.

View on arXiv PDF Code

Similar