CLAICDOct 24, 2025

Correlation Dimension of Auto-Regressive Large Language Models

arXiv:2510.21258v11 citationsh-index: 3
Originality Highly original
AI Analysis

This addresses the limitation of conventional metrics for LLMs, providing a new tool for researchers and practitioners to assess generative dynamics, though it is incremental in offering a novel evaluation method rather than a new model.

The authors tackled the problem of evaluating large language models (LLMs) by introducing correlation dimension, a fractal-geometric measure, to quantify long-range structural complexity in text, revealing phases in pretraining, context-dependent complexity, and tendencies toward hallucination and degeneration.

Large language models (LLMs) have achieved remarkable progress in natural language generation, yet they continue to display puzzling behaviors -- such as repetition and incoherence -- even when exhibiting low perplexity. This highlights a key limitation of conventional evaluation metrics, which emphasize local prediction accuracy while overlooking long-range structural complexity. We introduce correlation dimension, a fractal-geometric measure of self-similarity, to quantify the epistemological complexity of text as perceived by a language model. This measure captures the hierarchical recurrence structure of language, bridging local and global properties in a unified framework. Through extensive experiments, we show that correlation dimension (1) reveals three distinct phases during pretraining, (2) reflects context-dependent complexity, (3) indicates a model's tendency toward hallucination, and (4) reliably detects multiple forms of degeneration in generated text. The method is computationally efficient, robust to model quantization (down to 4-bit precision), broadly applicable across autoregressive architectures (e.g., Transformer and Mamba), and provides fresh insight into the generative dynamics of LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes