Frictional Agent Alignment Framework: Slow Down and Don't Break Things
This addresses the challenge of scalable, dynamic human-AI collaboration by enabling LLMs to act as adaptive thought partners, though it appears incremental as it builds on existing alignment methods for specific dynamic settings.
The paper tackles the problem of AI misalignment in dynamic collaborative tasks by proposing the Frictional Agent Alignment Framework (FAAF), which generates context-aware friction to prompt deliberation, resulting in outperformance on three benchmarks in producing concise, interpretable friction and OOD generalization.
AI support of collaborative interactions entails mediating potential misalignment between interlocutor beliefs. Common preference alignment methods like DPO excel in static settings, but struggle in dynamic collaborative tasks where the explicit signals of interlocutor beliefs are sparse and skewed. We propose the Frictional Agent Alignment Framework (FAAF), to generate precise, context-aware "friction" that prompts for deliberation and re-examination of existing evidence. FAAF's two-player objective decouples from data skew: a frictive-state policy identifies belief misalignments, while an intervention policy crafts collaborator-preferred responses. We derive an analytical solution to this objective, enabling training a single policy via a simple supervised loss. Experiments on three benchmarks show FAAF outperforms competitors in producing concise, interpretable friction and in OOD generalization. By aligning LLMs to act as adaptive "thought partners" -- not passive responders -- FAAF advances scalable, dynamic human-AI collaboration. Our code and data can be found at https://github.com/csu-signal/FAAF_ACL.