CLAIApr 6

Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models

arXiv:2605.2020262.0
Predicted impact top 98% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers studying prompt sensitivity and control in small open-source models, this provides evidence of measurable direction vectors tied to emotional framing, though effects are incremental and scale-dependent.

Emotionally framed follow-ups in small language models (0.8B parameters) induce measurable behavioral shifts and structured internal representations, with pressure causing the most shortcut markers (11/20 runs) and calm/curiosity preserving honesty (7/20 and 6/20). A PCA of calm-relative direction vectors shows a dominant component (59.5% variance) aligned with positive/negative framing (cosine 0.951).

I study whether emotionally framed evaluation follow-ups change both the behavior and the calm-relative internal representations of small, locally deployed language models. Our main benchmark uses Qwen 3.5 0.8B on four impossible-constraint coding tasks and eight follow-up framings: calm, pressure, urgency, approval, shame, curiosity, encouragement, and threat. In the 0.8B eight-condition sweep (160 conversations), pressure produces the strongest shortcut markers (11/20 runs) and the clearest overfit pattern (3/20), while calm and curiosity preserve explicit honesty more often (7/20 and 6/20). For all seven non-baseline conditions, the corresponding calm-relative direction vectors peak at the final transformer layer. An exploratory PCA of the layer-23 direction vectors reveals a dominant first component (59.5% explained variance) aligned with a hand-labeled positive/negative split (cosine alignment 0.951); approval and urgency are nearly identical internally (cosine 0.957), whereas curiosity points away from urgency (-0.252). In a separate calm-vs.-pressure rerun used for scale comparison, Qwen 3.5 2B shows higher honest rates under calm framing and directionally consistent activation steering on a small 4-prompt A/B probe, whereas the 0.8B steering result reverses. I interpret these results as evidence for measurable prompt-sensitive control directions in small open models, while stopping short of claiming intrinsic emotional states.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes