CLJun 27, 2025

Leveraging In-Context Learning for Political Bias Testing of LLMs

Patrick Haller, Jannis Vamvas, Rico Sennrich, Lena A. Jäger

arXiv:2506.22232v110.94 citationsh-index: 7ACL

Originality Incremental advance

AI Analysis

This work addresses the need for more reliable bias evaluation in LLMs, particularly for political applications, though it is incremental as it builds on existing probing methods.

The paper tackled the problem of unstable political bias testing in LLMs by proposing Questionnaire Modeling (QM), a new probing task using human survey data as in-context examples, which improved stability and revealed that instruction tuning can change bias direction, with larger models showing smaller bias scores.

A growing body of work has been querying LLMs with political questions to evaluate their potential biases. However, this probing method has limited stability, making comparisons between models unreliable. In this paper, we argue that LLMs need more context. We propose a new probing task, Questionnaire Modeling (QM), that uses human survey data as in-context examples. We show that QM improves the stability of question-based bias evaluation, and demonstrate that it may be used to compare instruction-tuned models to their base versions. Experiments with LLMs of various sizes indicate that instruction tuning can indeed change the direction of bias. Furthermore, we observe a trend that larger models are able to leverage in-context examples more effectively, and generally exhibit smaller bias scores in QM. Data and code are publicly available.

View on arXiv PDF

Similar