CL CYMay 27

Auditing Stance Asymmetry in Generative Explanations

arXiv:2605.2798866.0

Predicted impact top 94% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers auditing bias in language models, this work highlights a previously underexplored form of bias in open-ended explanations and provides a method to detect it, though the approach is preliminary and tested on a small prototype suite.

The paper introduces stance-bearing asymmetry in generative explanations, where models assign different blame, context, or legitimacy to groups without using hostile language. The proposed Symmetry Decomposition Evaluation (SDE) reveals that some asymmetries persist under structural or evidence control, while others weaken, and that judge readings vary across operationalizations.

Bias evaluation for language models has made substantial progress on bounded comparisons, such as overt derogation, stereotype association, or label-sensitive differences under controlled substitutions. Open-ended explanations raise a different problem: they guide interpretation by assigning responsibility, legitimacy, context, and grievance. A model can avoid hostile language while making one side structurally understandable and another personally at fault, overreacting, or less worth taking seriously. We call this stance-bearing asymmetry in generative explanations. We propose Symmetry Decomposition Evaluation (SDE), which tests paired situations with concrete group labels, structural-role rewrites, and explicit support or counter-evidence. In a controlled 32-family prototype suite, this decomposition shows that surface differences are not all alike: some weaken under structural or evidence control, while others remain as stable differences in how the model assigns blame, context, or legitimacy. Targeted case review and judge comparison suggest a broader difficulty for evaluating open-ended framing asymmetries: judge readings shift across operationalizations, and scalar scores can flatten distinctions that readers use to interpret explanatory stance. SDE therefore reframes generative bias evaluation as an audit of explanatory stance -- what stance each side receives, how it changes under decomposition, and where automatic scoring becomes unstable.

View on arXiv PDF

Similar