AIHCMAJun 26, 2025

Ad-Hoc Human-AI Coordination Challenge

Meta AIOxford
arXiv:2506.21490v23 citationsh-index: 10Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the problem of costly human evaluations for AI researchers working on coordination tasks, though it is incremental as it builds on existing testbeds like Hanabi.

The paper tackles the challenge of evaluating human-AI coordination by introducing the Ad-Hoc Human-AI Coordination Challenge (AH2AC2), which uses human proxy agents on a dataset of 3,079 games to provide cheap and reproducible evaluation, with baseline results presented for two- and three-player Hanabi scenarios.

Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been limited by the challenges of human evaluation. In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. We develop \textit{human proxy agents} on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we open-source a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. We present baseline results for both two- and three- player Hanabi scenarios. To ensure fair evaluation, we host the proxy agents through a controlled evaluation system rather than releasing them publicly. The code is available at \href{https://github.com/FLAIROx/ah2ac2}{https://github.com/FLAIROx/ah2ac2}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes