CL AIMay 16

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

Mohd Ariful Haque, Fahad Rahman, Kishor Datta Gupta, Roy George

arXiv:2605.2884820.0h-index: 7

AI Analysis

For AI safety researchers and deployers, it provides a dynamic benchmark to detect group-conditioned framing in LLMs over time, though it is a pilot with limited scope.

The paper introduces GPF-LiveNews, a streaming evaluation protocol for auditing how LLMs frame news events differently for various prompted audiences. In a pilot with 23 models, Policy/Action prompts caused the strongest semantic shifts, while sentiment variation was flatter.

Deployed language models are evaluated in a non-stationary environment: model versions, retrieval layers, safety systems, and real-world inputs all change over time. Static bias benchmarks remain useful, but they do not show how models frame newly emerging events for different prompted audiences. We introduce GPF-LIVENEWS, a streaming evaluation protocol and benchmark snapshot for auditing group-conditioned framing in open-ended LLM outputs. The protocol expands fresh BBC/Reuters news anchors across 42 identity labels and seven prompt families, then evaluates response bundles using semantic-sensitivity and sentiment-disparity signals. In a pilot over 12 monitoring runs and 23 hosted models, Policy/Action prompts produce the strongest semantic movement, while sentiment variation is flatter across dimensions and prompt families. The released artifact includes article metadata, prompt templates, instantiated prompts, model-output metadata, score tables, documentation, and reproduction scripts. We interpret all scores as observed-window audit signals for human review, not as permanent fairness rankings or direct proof of harmful bias.

View on arXiv PDF

Similar