AIMay 9

Do LLMs Experience an Internal Polylogue? Investigating Reasoning through the Lens of Personas

Nils A. Herrmann, Leander Girrbach, Kirill Bykov, Zeynep Akata

arXiv:2605.0915944.1

AI Analysis

For researchers seeking interpretable tools for reasoning-time monitoring and control in LLMs, this work introduces a novel dynamic use of persona vectors, though improvements are modest and limited to specific models.

The authors propose monitoring the time series of persona vector alignments (polylogue) during LLM generation, showing that polylogue features predict correctness on MMLU-Pro competitively with low-dimensional baselines and enable stage-aware steering that improves accuracy on three of four models.

Recent work shows that large language models (LLMs) encode behavioural traits ("personas") as linear directions in activation space, often called "persona vectors". Prior work has used such directions as static handles for behavioural steering. Building on this, we treat them as dynamic signals instead: probes we can monitor and intervene on as reasoning unfolds. We use the term polylogue to denote the time series of alignments between persona vectors and hidden activations over the course of generation. Experiments across four open-weight models show that polylogue features predict correctness on MMLU-Pro competitively with low-dimensional activation baselines, while remaining interpretable through their associated persona directions. They also suggest concrete steering targets, namely which latent directions to modulate at different stages of a response. We instantiate this as a simple paragraph-conditioned intervention that improves accuracy on three of four models, pointing to stage-aware latent steering as a promising direction for reasoning-time control. Together, this positions the polylogue as an interpretable tool for reasoning-time monitoring and intervention.

View on arXiv PDF

Similar