Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
This addresses the problem of underspecification in personalized alignment for small models, offering an incremental improvement over passive methods.
The paper investigates whether actively inferring preference descriptions improves alignment of small language models to personal preferences, using a synthetic dataset based on famous people. Results show that higher-quality active prefixes enhance generalization, contextual faithfulness, and reduce biases, suggesting active alignment is more controllable and efficient.
A prominent issue in aligning language models (LMs) to personalized preferences is underspecification -- the lack of information from users about their preferences. A popular trend of injecting such specification is adding a prefix (e.g. prior relevant conversations) to the current user's conversation to steer preference distribution. Most methods passively model personal preferences with prior example preferences pairs. We ask whether models benefit from actively inferring preference descriptions, and address this question by creating a synthetic personalized alignment dataset based on famous people with known public preferences. We then test how effective finetuned 1-8B size models are at inferring and aligning to personal preferences. Results show that higher-quality active prefixes lead to better generalization, more contextually faithful models, and less systematic biases across different protected attributes. All our results suggest active alignment can lead to a more controllable and efficient path for personalized alignment.