CLLGMar 1

Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning

arXiv:2603.01326v17 citationsh-index: 4
Originality Highly original
AI Analysis

This work addresses the challenge of understanding LLM reasoning for researchers and practitioners, offering a novel perspective on explainability that mitigates reliance on lexical confounds, though it is incremental in shifting from static to dynamic analysis.

The paper tackles the problem of explainability in Large Language Models by addressing the limitations of static activation analysis, introducing Truth as a Trajectory (TaT) to model inference as a trajectory of layer-wise geometric displacements, which outperforms conventional probing methods on benchmarks like commonsense reasoning and question answering.

Existing explainability methods for Large Language Models (LLMs) typically treat hidden states as static points in activation space, assuming that correct and incorrect inferences can be separated using representations from an individual layer. However, these activations are saturated with polysemantic features, leading to linear probes learning surface-level lexical patterns rather than underlying reasoning structures. We introduce Truth as a Trajectory (TaT), which models the transformer inference as an unfolded trajectory of iterative refinements, shifting analysis from static activations to layer-wise geometric displacement. By analyzing displacement of representations across layers, TaT uncovers geometric invariants that distinguish valid reasoning from spurious behavior. We evaluate TaT across dense and Mixture-of-Experts (MoE) architectures on benchmarks spanning commonsense reasoning, question answering, and toxicity detection. Without access to the activations themselves and using only changes in activations across layers, we show that TaT effectively mitigates reliance on static lexical confounds, outperforming conventional probing, and establishes trajectory analysis as a complementary perspective on LLM explainability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes