Political Bias in LLMs: Unaligned Moral Values in Agent-centric Simulations
This highlights a problem for social scientists using LLMs to simulate social interactions, as it reveals incremental limitations in model alignment for out-of-domain tasks.
The study investigated how personalized language models align with human moral values using the Moral Foundation Theory Questionnaire, finding that models produce inconsistent results with high variance and weak correlation to human data, especially for conservative personas.
Contemporary research in social sciences increasingly utilizes state-of-the-art generative language models to annotate or generate content. While these models achieve benchmark-leading performance on common language tasks, their application to novel out-of-domain tasks remains insufficiently explored. To address this gap, we investigate how personalized language models align with human responses on the Moral Foundation Theory Questionnaire. We adapt open-source generative language models to different political personas and repeatedly survey these models to generate synthetic data sets where model-persona combinations define our sub-populations. Our analysis reveals that models produce inconsistent results across multiple repetitions, yielding high response variance. Furthermore, the alignment between synthetic data and corresponding human data from psychological studies shows a weak correlation, with conservative persona-prompted models particularly failing to align with actual conservative populations. These results suggest that language models struggle to coherently represent ideologies through in-context prompting due to their alignment process. Thus, using language models to simulate social interactions requires measurable improvements in in-context optimization or parameter manipulation to align with psychological and sociological stereotypes properly.