From Explicit Communication to Tacit Cooperation:A Novel Paradigm for Cooperative MARL
This addresses the problem of limited cooperation in decentralized multi-agent systems for researchers and practitioners, offering a novel approach to reduce reliance on communication.
The paper tackles the challenge of fostering cooperation in multi-agent reinforcement learning under partial observability by proposing a paradigm that transitions from explicit communication to tacit cooperation, achieving performance without communication that approaches or surpasses QMIX and communication-based methods in various scenarios.
Centralized training with decentralized execution (CTDE) is a widely-used learning paradigm that has achieved significant success in complex tasks. However, partial observability issues and the absence of effectively shared signals between agents often limit its effectiveness in fostering cooperation. While communication can address this challenge, it simultaneously reduces the algorithm's practicality. Drawing inspiration from human team cooperative learning, we propose a novel paradigm that facilitates a gradual shift from explicit communication to tacit cooperation. In the initial training stage, we promote cooperation by sharing relevant information among agents and concurrently reconstructing this information using each agent's local trajectory. We then combine the explicitly communicated information with the reconstructed information to obtain mixed information. Throughout the training process, we progressively reduce the proportion of explicitly communicated information, facilitating a seamless transition to fully decentralized execution without communication. Experimental results in various scenarios demonstrate that the performance of our method without communication can approaches or even surpasses that of QMIX and communication-based methods.