CL AIFeb 3

Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness

Alireza Amiri-Margavi, Arshia Gharagozlou, Amin Gholami Davodi, Seyed Pouyan Mousavi Davoudi, Hamidreza Hasani Balyani

arXiv:2602.02932v12.13 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses fairness for users of LLMs by revealing hidden biases in interaction quality, though it is incremental as it extends existing audit methods.

The paper tackles the problem of fairness in large language models (LLMs) by auditing interaction quality beyond access-level behaviors, finding that GPT-4 and LLaMA-3.1-70B show systematic disparities in tone and hedging across demographic identities despite zero refusal rates.

Prior work on fairness in large language models (LLMs) has primarily focused on access-level behaviors such as refusals and safety filtering. However, equitable access does not ensure equitable interaction quality once a response is provided. In this paper, we conduct a controlled fairness audit examining how LLMs differ in tone, uncertainty, and linguistic framing across demographic identities after access is granted. Using a counterfactual prompt design, we evaluate GPT-4 and LLaMA-3.1-70B on career advice tasks while varying identity attributes along age, gender, and nationality. We assess access fairness through refusal analysis and measure interaction quality using automated linguistic metrics, including sentiment, politeness, and hedging. Identity-conditioned differences are evaluated using paired statistical tests. Both models exhibit zero refusal rates across all identities, indicating uniform access. Nevertheless, we observe systematic, model-specific disparities in interaction quality: GPT-4 expresses significantly higher hedging toward younger male users, while LLaMA exhibits broader sentiment variation across identity groups. These results show that fairness disparities can persist at the interaction level even when access is equal, motivating evaluation beyond refusal-based audits.

View on arXiv PDF

Similar