CLApr 29, 2025

HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation

arXiv:2505.00038v23 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the need for user-dependent preference control in LLMs, offering an interpretable and sample-efficient personalization strategy, though it appears incremental as it builds on existing alignment methods.

The paper tackles the problem of personalizing large language model outputs to individual users rather than aligning to average preferences, proposing HyPerAlign, a hypothesis-driven approach that infers user-specific attributes from few-shot examples to generate customized responses. Results show high win-rates (>90%) in authorship attribution and up to 70% improvement in helpfulness for deliberative alignment compared to preference-based fine-tuning methods.

Alignment algorithms are widely used to align large language models (LLMs) to human users based on preference annotations. Typically these (often divergent) preferences are aggregated over a diverse set of users, resulting in fine-tuned models that are aligned to the ``average-user'' preference. Nevertheless, current models are used by individual users in very specific contexts and situations, emphasizing the need for user-dependent preference control. In this work we address the problem of personalizing LLM outputs to their users. We aim to generate customized responses tailored to specific individuals instead of generic outputs that emulate the collective voices of diverse populations. We propose HyPerAlign, an interpretable and sample-efficient hypothesis-driven personalization approach for LLM models. Given few-shot examples written by a particular user, we first infer hypotheses about their communication strategies, personality, and writing style, then prompt LLM models with these hypotheses and user-specific attributes to generate customized outputs. We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment, with datasets from diverse domains (news articles, blog posts, emails, jailbreaking benchmarks). Results demonstrate the superiority of hypothesis-driven LLM personalization compared to preference-based fine-tuning methods. For authorship attribution, HyPerAlign generations have consistently high win-rates (commonly $> 90\%$) against state-of-the-art preference fine-tuning approaches across diverse user profiles and LLM models. For deliberative alignment, the helpfulness of LLM models is improved by up to $70\%$ on average. Overall, HyPerAlign represents an interpretable and sample-efficient strategy for the personalization of LLM models to individual users.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes