CLAILGDec 19, 2022

Continual Learning for Instruction Following from Realtime Feedback

Berkeley
arXiv:2212.09710v223 citationsh-index: 36
AI Analysis

This addresses the challenge of continual learning for human-AI collaboration, though it is incremental as it builds on existing contextual bandit methods.

The paper tackled the problem of training an instruction-following agent by using real-time binary feedback from users during interactions, resulting in a 15.4% absolute improvement in instruction execution accuracy over time.

We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to immediate reward. We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes