CLCYApr 19, 2025

Probing the Subtle Ideological Manipulation of Large Language Models

arXiv:2504.14287v1h-index: 32
Originality Incremental advance
AI Analysis

This work addresses the problem of subtle ideological biases in LLMs for AI safety and fairness, representing an incremental advance by moving beyond binary political classifications.

The study tackled the susceptibility of large language models to nuanced ideological manipulation across a political spectrum, finding that fine-tuning significantly enhances ideological alignment while explicit prompts offer only minor improvements.

Large Language Models (LLMs) have transformed natural language processing, but concerns have emerged about their susceptibility to ideological manipulation, particularly in politically sensitive areas. Prior work has focused on binary Left-Right LLM biases, using explicit prompts and fine-tuning on political QA datasets. In this work, we move beyond this binary approach to explore the extent to which LLMs can be influenced across a spectrum of political ideologies, from Progressive-Left to Conservative-Right. We introduce a novel multi-task dataset designed to reflect diverse ideological positions through tasks such as ideological QA, statement ranking, manifesto cloze completion, and Congress bill comprehension. By fine-tuning three LLMs-Phi-2, Mistral, and Llama-3-on this dataset, we evaluate their capacity to adopt and express these nuanced ideologies. Our findings indicate that fine-tuning significantly enhances nuanced ideological alignment, while explicit prompts provide only minor refinements. This highlights the models' susceptibility to subtle ideological manipulation, suggesting a need for more robust safeguards to mitigate these risks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes