AIROApr 5, 2025

ADAPT: Actively Discovering and Adapting to Preferences for any Task

arXiv:2504.04040v13 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the challenge of under-specified tasks for assistive agents, though it is incremental as it builds on existing LLM methods with a novel training approach.

The paper tackles the problem of assistive agents failing to adhere to user preferences in long-horizon household tasks by introducing ADAPT, a benchmark for evaluating preference adherence, and Reflection-DPO, a training approach that improves preference satisfaction by 6.1% over a baseline on unseen users.

Assistive agents should be able to perform under-specified long-horizon tasks while respecting user preferences. We introduce Actively Discovering and Adapting to Preferences for any Task (ADAPT) -- a benchmark designed to evaluate agents' ability to adhere to user preferences across various household tasks through active questioning. Next, we propose Reflection-DPO, a novel training approach for adapting large language models (LLMs) to the task of active questioning. Reflection-DPO finetunes a 'student' LLM to follow the actions of a privileged 'teacher' LLM, and optionally ask a question to gather necessary information to better predict the teacher action. We find that prior approaches that use state-of-the-art LLMs fail to sufficiently follow user preferences in ADAPT due to insufficient questioning and poor adherence to elicited preferences. In contrast, Reflection-DPO achieves a higher rate of satisfying user preferences, outperforming a zero-shot chain-of-thought baseline by 6.1% on unseen users.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes