LGAIHCROJul 12, 2023

Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

arXiv:2307.06333v219 citationsh-index: 46
Originality Incremental advance
AI Analysis

This addresses distribution shift in deployed policies by personalizing adaptation to individual user preferences, representing an incremental improvement over existing data augmentation methods.

The paper tackles the problem of policies failing due to distribution shift by proposing an interactive framework that uses human feedback to identify personalized task-irrelevant concepts for data augmentation, resulting in reduced demonstrations for fine-tuning and alignment with user preferences.

Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes