AI LG MANov 22, 2022

Decision-making with Speculative Opponent Models

Jing Sun, Shuo Chen, Cong Zhang, Yining Ma, Jie Zhang

arXiv:2211.11940v36.22 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses a key limitation in multi-agent reinforcement learning by enabling opponent modeling without direct opponent data, which is incremental but practical for real-world applications.

The paper tackles the problem of opponent modeling in multi-agent decision-making when opponent observations and actions are unavailable, introducing DOMAC, a speculative opponent modeling algorithm that uses only local information and achieves superior performance with faster convergence in eight benchmark tasks.

Opponent modelling has proven effective in enhancing the decision-making of the controlled agent by constructing models of opponent agents. However, existing methods often rely on access to the observations and actions of opponents, a requirement that is infeasible when such information is either unobservable or challenging to obtain. To address this issue, we introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC), the first speculative opponent modelling algorithm that relies solely on local information (i.e., the controlled agent's observations, actions, and rewards). Specifically, the actor maintains a speculated belief about the opponents using the tailored speculative opponent models that predict the opponents' actions using only local information. Moreover, DOMAC features distributional critic models that estimate the return distribution of the actor's policy, yielding a more fine-grained assessment of the actor's quality. This thus more effectively guides the training of the speculative opponent models that the actor depends upon. Furthermore, we formally derive a policy gradient theorem with the proposed opponent models. Extensive experiments under eight different challenging multi-agent benchmark tasks within the MPE, Pommerman and StarCraft Multiagent Challenge (SMAC) demonstrate that our DOMAC successfully models opponents' behaviours and delivers superior performance against state-of-the-art methods with a faster convergence speed.

View on arXiv PDF

Similar