CY CL LGJun 24, 2025

A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models

Sadia Kamal, Lalu Prasad Yadav Prakash, S M Rafiuddin, Mohammed Rakib, Atriya Sen, Sagnik Ray Choudhury

arXiv:2506.22493v43.32 citationsh-index: 17IJCNLP-AACL

Originality Incremental advance

AI Analysis

This reveals validity concerns about using political compass tests to measure bias in LLMs, which is important for researchers and developers working on AI alignment and fairness.

The study analyzed how prompt phrasing and fine-tuning affect political bias test scores in large language models, finding these factors significantly influence results while standard generation parameters have minimal effect, and showing that fine-tuning on politically rich versus neutral datasets doesn't produce different score shifts.

The Political Compass Test (PCT) and similar surveys are commonly used to assess political bias in auto-regressive LLMs. Our rigorous statistical experiments show that while changes to standard generation parameters have minimal effect on PCT scores, prompt phrasing and fine-tuning individually and together can significantly influence results. Interestingly, fine-tuning on politically rich vs. neutral datasets does not lead to different shifts in scores. We also generalize these findings to a similar popular test called 8 Values. Humans do not change their responses to questions when prompted differently (``answer this question'' vs ``state your opinion''), or after exposure to politically neutral text, such as mathematical formulae. But the fact that the models do so raises concerns about the validity of these tests for measuring model bias, and paves the way for deeper exploration into how political and social views are encoded in LLMs.

View on arXiv PDF

Similar