CLMay 27

Framing Matters: Addressing Framing Sensitivity in Decision-Making through Behaviorally-Grounded Value Alignment

Seojin Hwang, Minju Kim, Junhyuk Choi, JeongHyun Park, Hwanhee Lee

arXiv:2605.2818837.6

AI Analysis

Addresses the critical problem of inconsistent LLM decisions in high-stakes domains like legal reasoning, where factual equivalence should yield consistent outcomes.

LLMs exhibit high framing sensitivity in decision-making, with an average decision flip rate of 28.6% under fact-preserving but differently framed inputs. The proposed Valign method reduces framing-induced flips by targeting internal representations.

Large Language Models (LLMs) are increasingly deployed in high-stakes decision-making settings such as legal reasoning, where consistency under factually equivalent inputs is critical. However, we find that fact-preserved but differently framed inputs can significantly destabilize LLM decisions. To systematically investigate this problem, we introduce Fragile, a large-scale benchmark that isolates fact-preserving semantic framing across three controlled dimensions: value-tinted narration, temporal slice, and narrative vividness. Our experiments reveal a high susceptibility of LLMs to framing, with an average decision flip rate of 28.6%. We find that simple prior prompt-level and activation-level interventions not only fail to suppress framing sensitivity but actively amplify it. We therefore propose Valign, a representation-level method that explicitly targets these framing dimensions by anchoring decisions to a stable value prior, steering hidden states toward the model's value-consistent direction, and projecting out temporal-vividness-sensitive directions from the model's hidden states. Valign consistently reduces framing-induced decision flips, demonstrating that robust mitigation requires directly targeting the internal pathways in which framing operates.

View on arXiv PDF

Similar