AICLLGFeb 18

Learning Personalized Agents from Human Feedback

Princeton
arXiv:2602.16173v15 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of personalizing AI agents for individual users with changing preferences, though it is incremental as it builds on prior work with explicit memory and feedback.

The paper tackles the problem of AI agents failing to align with individual users' evolving preferences by introducing the PAHF framework for continual personalization from live interaction, showing it reduces personalization error and enables rapid adaptation to preference shifts.

Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding user profiles in external memory. However, these approaches struggle with new users and with preferences that change over time. We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization in which agents learn online from live interaction using explicit per-user memory. PAHF operationalizes a three-step loop: (1) seeking pre-action clarification to resolve ambiguity, (2) grounding actions in preferences retrieved from memory, and (3) integrating post-action feedback to update memory when preferences drift. To evaluate this capability, we develop a four-phase protocol and two benchmarks in embodied manipulation and online shopping. These benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts. Our theoretical analysis and empirical results show that integrating explicit memory with dual feedback channels is critical: PAHF learns substantially faster and consistently outperforms both no-memory and single-channel baselines, reducing initial personalization error and enabling rapid adaptation to preference shifts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes