CLAIApr 17, 2025

Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment

arXiv:2504.12663v210 citationsh-index: 13Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the problem of scalable and adaptive personalized alignment for users of large language models, though it appears incremental as it builds on existing alignment paradigms with a novel discriminative approach.

The paper tackles the challenge of aligning language models with human preferences for personalization without high computational costs, introducing Persona-judge, a training-free method that uses token-level self-judgment to achieve scalable and efficient personalized alignment.

Aligning language models with human preferences presents significant challenges, particularly in achieving personalization without incurring excessive computational costs. Existing methods rely on reward signals and additional annotated data, limiting their scalability and adaptability to diverse human values. To address these challenges, we introduce Persona-judge, a novel discriminative paradigm that enables training-free personalized alignment with unseen preferences. Instead of optimizing policy parameters through external reward feedback, Persona-judge leverages the intrinsic preference judgment capabilities of the model. Specifically, a draft model generates candidate tokens conditioned on a given preference, while a judge model, embodying another preference, cross-validates the predicted tokens whether to be accepted. Experimental results demonstrate that Persona-judge, using the inherent preference evaluation mechanisms of the model, offers a scalable and computationally efficient solution to personalized alignment, paving the way for more adaptive customized alignment. Our code is available here.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes