CLAIOct 6, 2025

Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness

arXiv:2510.04484v13 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses the need for trustworthy and effective LLM steering in socially interactive applications, though it is incremental as it builds on existing steering methods.

The researchers tackled the problem of controlling emotional and personality traits in LLMs for social interactions, finding that prompting is effective but limited, while vector injections offer finer control with slight quality trade-offs, and they observed that even positive emotions like joy can degrade robustness and increase bias.

The ability to control LLMs' emulated emotional states and personality traits is essential for enabling rich, human-centered interactions in socially interactive settings. We introduce PsySET, a Psychologically-informed benchmark to evaluate LLM Steering Effectiveness and Trustworthiness across the emotion and personality domains. Our study spans four models from different LLM families paired with various steering strategies, including prompting, fine-tuning, and representation engineering. Our results indicate that prompting is consistently effective but limited in intensity control, whereas vector injections achieve finer controllability while slightly reducing output quality. Moreover, we explore the trustworthiness of steered LLMs by assessing safety, truthfulness, fairness, and ethics, highlighting potential side effects and behavioral shifts. Notably, we observe idiosyncratic effects; for instance, even a positive emotion like joy can degrade robustness to adversarial factuality, lower privacy awareness, and increase preferential bias. Meanwhile, anger predictably elevates toxicity yet strengthens leakage resistance. Our framework establishes the first holistic evaluation of emotion and personality steering, offering insights into its interpretability and reliability for socially interactive applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes