CLFeb 17, 2025

Personality Editing for Language Models through Adjusting Self-Referential Queries

arXiv:2502.11789v31 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the need for cost-effective and robust personality control in LLMs for developers and users, though it appears incremental as it builds on existing editing techniques.

The paper tackles the problem of controlling personality in large language models for applications like conversational agents, presenting PALETTE, a method that uses self-referential queries to edit personality with only 12 samples, achieving substantial improvements in alignment across dimensions.

Large Language Models (LLMs) are integral to applications such as conversational agents and content creation, where precise control over a model's personality is essential for maintaining tone, consistency, and user engagement. However, prevailing prompt-based or fine-tuning approaches either lack robustness or demand large-scale training data, making them costly and impractical. In this paper, we present PALETTE (Personality Adjustment by LLM SElf-TargeTed quEries), a novel method for personality editing in LLMs. Our approach introduces adjustment queries, where self-referential statements grounded in psychological constructs are treated analogously to factual knowledge, enabling direct editing of personality-related responses. Unlike fine-tuning, PALETTE requires only 12 editing samples to achieve substantial improvements in personality alignment across personality dimensions. Experimental results from both automatic and human evaluations demonstrate that our method enables more stable and well-balanced personality control in LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes