CLAIJan 23

Trapped in the past? Disentangling fluid and crystallized intelligence of large language models using chess

arXiv:2601.16823v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the problem of understanding reasoning limitations in LLMs for AI researchers, highlighting incremental progress with a focus on systematic generalization.

The study used chess as a testbed to disentangle crystallized and fluid intelligence in large language models, finding that performance degrades as reasoning demands increase and collapses to random levels in out-of-distribution tasks.

Large Language Models (LLMs) exhibit remarkable capabilities, yet it remains unclear to what extent these reflect sophisticated recall (crystallized intelligence) or reasoning ability (fluid intelligence). We introduce chess as a controlled testbed for disentangling these faculties. Leveraging the game's structure and scalable engine evaluations, we construct a taxonomy of positions varying in training corpus proximity--ranging from common states solvable by memorization to novel ones requiring first-principles reasoning. We systematically evaluate multiple GPT generations under varying reasoning intensities. Our analysis reveals a clear gradient: performance consistently degrades as fluid intelligence demands increase. Notably, in out-of-distribution tasks, performance collapses to random levels. While newer models improve, progress slows significantly for tasks outside the training distribution. Furthermore, while reasoning-augmented inference improves performance, its marginal benefit per token decreases with distributional proximity. These results suggest current architectures remain limited in systematic generalization, highlighting the need for mechanisms beyond scale to achieve robust fluid intelligence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes