MA AI LGDec 9, 2024

Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi

F. Bredell, H. A. Engelbrecht, J. C. Schoeman

arXiv:2412.06333v31.2Has CodeAutonomous Agents and Multi-Agent Systems

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient multi-agent reinforcement learning in partially observable environments with limited communication, offering an incremental improvement over existing methods.

The paper tackled the problem of improving multi-agent cooperation in the Hanabi card game by augmenting agents' action spaces with human-like conventions, resulting in significant performance gains for self-play and cross-play across various numbers of cooperators.

The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, partial observability, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of "rules" or principles. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting an agent's action space using conventions, which act as a sequence of special cooperative actions that span over and include multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play for various number of cooperators within Hanabi.

View on arXiv PDF Code

Similar