AI LG NEApr 28, 2020

Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners

Rodrigo Canaan, Xianbo Gao, Youjin Chung, Julian Togelius, Andy Nealen, Stefan Menzel

arXiv:2004.13291v18.44 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This highlights a key limitation in AI for cooperative games with unknown partners, which is incremental as it builds on existing Rainbow DQN methods.

The paper tackled the problem of ad-hoc cooperation in Hanabi, showing that Rainbow DQN agents trained via self-play perform poorly with unseen rule-based partners, and conversely, training with specific rule-based agents leads to low self-play scores.

Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. While thereare agents that can achieve near-perfect scores in the game byagreeing on some shared strategy, comparatively little progresshas been made in ad-hoc cooperation settings, where partnersand strategies are not known in advance. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training and, conversely, whenthese agents are trained to play with any individual rule-basedagent, or even a mix of these agents, they fail to achieve goodself-play scores.

View on arXiv PDF Code

Similar