CLAINENCJun 26, 2025

Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis

arXiv:2506.22516v12 citationsh-index: 2Nat Lang Process J
Originality Synthesis-oriented
AI Analysis

This work addresses the speculative problem of attributing consciousness to AI systems for researchers and philosophers, but it is incremental as it applies existing theories to new data without novel breakthroughs.

The study investigated whether consciousness-like phenomena can be detected in large language model (LLM) internal states using Integrated Information Theory (IIT) and Span Representation analysis on Theory of Mind test data, finding no statistically significant indicators of consciousness but revealing interesting patterns in spatio-permutational analyses.

Integrated Information Theory (IIT) provides a quantitative framework for explaining consciousness phenomenon, positing that conscious systems comprise elements integrated through causal properties. We apply IIT 3.0 and 4.0 -- the latest iterations of this framework -- to sequences of Large Language Model (LLM) representations, analyzing data derived from existing Theory of Mind (ToM) test results. Our study systematically investigates whether the differences of ToM test performances, when presented in the LLM representations, can be revealed by IIT estimates, i.e., $Φ^{\max}$ (IIT 3.0), $Φ$ (IIT 4.0), Conceptual Information (IIT 3.0), and $Φ$-structure (IIT 4.0). Furthermore, we compare these metrics with the Span Representations independent of any estimate for consciousness. This additional effort aims to differentiate between potential "consciousness" phenomena and inherent separations within LLM representational space. We conduct comprehensive experiments examining variations across LLM transformer layers and linguistic spans from stimuli. Our results suggest that sequences of contemporary Transformer-based LLM representations lack statistically significant indicators of observed "consciousness" phenomena but exhibit intriguing patterns under $\textit{spatio}$-permutational analyses. The Appendix and code are available as Supplementary Materials at: https://doi.org/10.1016/j.nlp.2025.100163.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes