CL AI LGJun 14, 2023

Revealing the structure of language model capabilities

Ryan Burnell, Han Hao, Andrew R. A. Conway, Jose Hernandez Orallo

arXiv:2306.10062v19.639 citationsh-index: 44Has Code

Originality Incremental advance

AI Analysis

This work provides a theoretical framework for understanding and predicting LLM behavior, which is incremental in refining scaling laws and benchmark design for AI researchers.

The study investigated the structure of large language model (LLM) capabilities by analyzing 29 LLMs across 27 cognitive tasks, finding that capabilities are not monolithic but better explained by three factors—reasoning, comprehension, and core language modeling—which explain a high proportion of variance in performance.

Building a theoretical understanding of the capabilities of large language models (LLMs) is vital for our ability to predict and explain the behavior of these systems. Here, we investigate the structure of LLM capabilities by extracting latent capabilities from patterns of individual differences across a varied population of LLMs. Using a combination of Bayesian and frequentist factor analysis, we analyzed data from 29 different LLMs across 27 cognitive tasks. We found evidence that LLM capabilities are not monolithic. Instead, they are better explained by three well-delineated factors that represent reasoning, comprehension and core language modeling. Moreover, we found that these three factors can explain a high proportion of the variance in model performance. These results reveal a consistent structure in the capabilities of different LLMs and demonstrate the multifaceted nature of these capabilities. We also found that the three abilities show different relationships to model properties such as model size and instruction tuning. These patterns help refine our understanding of scaling laws and indicate that changes to a model that improve one ability might simultaneously impair others. Based on these findings, we suggest that benchmarks could be streamlined by focusing on tasks that tap into each broad model ability.

View on arXiv PDF Code

Similar