LG HC MAOct 15, 2021

Collaborating with Humans without Human Data

DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard Everett

arXiv:2110.08176v232.1222 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of creating adaptable AI collaborators for humans in cooperative tasks, reducing reliance on costly human data collection, though it is incremental as it builds on existing multi-agent reinforcement learning techniques.

The paper tackles the problem of training AI agents to collaborate effectively with human partners without requiring human data, by introducing Fictitious Co-Play (FCP), which trains agents against a diverse set of self-play agents and their checkpoints. The result shows that FCP agents achieve significantly higher scores than baseline methods like self-play and behavioral cloning play when paired with novel agents and humans, and humans strongly prefer partnering with FCP agents.

Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.

View on arXiv PDF

Similar