Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners
This highlights a key limitation in AI for cooperative games with unknown partners, which is incremental as it builds on existing Rainbow DQN methods.
The paper tackled the problem of ad-hoc cooperation in Hanabi, showing that Rainbow DQN agents trained via self-play perform poorly with unseen rule-based partners, and conversely, training with specific rule-based agents leads to low self-play scores.
Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. While thereare agents that can achieve near-perfect scores in the game byagreeing on some shared strategy, comparatively little progresshas been made in ad-hoc cooperation settings, where partnersand strategies are not known in advance. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training and, conversely, whenthese agents are trained to play with any individual rule-basedagent, or even a mix of these agents, they fail to achieve goodself-play scores.