MA AIJan 20, 2024

Measuring Policy Distance for Multi-Agent Reinforcement Learning

Tianyi Hu, Zhiqiang Pu, Xiaolin Ai, Tenghai Qiu, Jianqiang Yi

arXiv:2401.11257v24.35 citationsHas CodeAAMAS

Originality Incremental advance

AI Analysis

This work addresses a gap in MARL by providing a tool to evaluate and guide diversity-based algorithms, which is incremental as it builds on existing diversity methods.

The authors tackled the lack of a general metric to quantify policy differences in multi-agent reinforcement learning (MARL) by proposing the multi-agent policy distance (MAPD), which effectively measures differences in agent policies and specific behavioral tendencies, and demonstrated its application in a dynamic parameter sharing algorithm that exhibits superior performance compared to other methods.

Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.

View on arXiv PDF Code

Similar