AILGApr 13

Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space

arXiv:2604.120168.5
Predicted impact top 86% in AI · last 90 daysOriginality Incremental advance
AI Analysis

For researchers studying LLM internal representations and agentic AI, this provides evidence that identity documents create attractor-like structures, potentially informing persistent agent architectures.

The paper investigates whether persistent agent identity documents (cognitive_core) induce attractor-like dynamics in LLM activation space. Experiments on Llama 3.1 8B and Gemma 2 9B show that paraphrases of an original identity converge to tighter clusters than controls (Cohen's d > 1.88, p < 10^{-27}), and reading a description shifts states toward the attractor, suggesting representational evidence for agent identity geometry.

Large language models map semantically related prompts to similar internal representations -- a phenomenon interpretable as attractor-like dynamics. We ask whether the identity document of a persistent cognitive agent (its cognitive_core) exhibits analogous attractor-like behavior. We present a controlled experiment on Llama 3.1 8B Instruct, comparing hidden states of an original cognitive_core (Condition A), seven paraphrases (Condition B), and seven structurally matched controls (Condition C). Mean-pooled states at layers 8, 16, and 24 show that paraphrases converge to a tighter cluster than controls (Cohen's d > 1.88, p < 10^{-27}, Bonferroni-corrected). Replication on Gemma 2 9B confirms cross-architecture generalizability. Ablations suggest the effect is primarily semantic rather than structural, and that structural completeness appears necessary to reach the attractor region. An exploratory experiment shows that reading a scientific description of the agent shifts internal state toward the attractor -- closer than a sham preprint -- distinguishing knowing about an identity from operating as that identity. These results provide representational evidence that agent identity documents induce attractor-like geometry in LLM activation space.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes