CLAIFeb 13, 2025

Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning

Amazon
arXiv:2502.08972v311 citationsh-index: 52NAACL
Originality Highly original
AI Analysis

This addresses the need for personalized text generation for users, offering a novel approach that is incremental in its method but shows strong performance gains.

The paper tackles the problem of language models producing generic outputs that do not align with individual user styles by introducing TICL, a tuning-free method that personalizes models for text generation tasks with fewer than 10 examples per user, achieving win rates up to 91.5% against the previous state-of-the-art.

Language models are aligned to the collective voice of many, resulting in generic outputs that do not align with specific users' styles. In this work, we present Trial-Error-Explain In-Context Learning (TICL), a tuning-free method that personalizes language models for text generation tasks with fewer than 10 examples per user. TICL iteratively expands an in-context learning prompt via a trial-error-explain process, adding model-generated negative samples and explanations that provide fine-grained guidance towards a specific user's style. TICL achieves favorable win rates on pairwise comparisons with LLM-as-a-judge up to 91.5% against the previous state-of-the-art and outperforms competitive tuning-free baselines for personalized alignment tasks of writing emails, essays and news articles. Both lexical and qualitative analyses show that the negative samples and explanations enable language models to learn stylistic context more effectively and overcome the bias towards structural and formal phrases observed in their zero-shot outputs. By front-loading inference compute to create a user-specific in-context learning prompt that does not require extra generation steps at test time, TICL presents a novel yet simple approach for personalized alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes