LGMAOct 7, 2022

Multi-agent Deep Covering Skill Discovery

arXiv:2210.03269v32 citationsh-index: 43
Originality Incremental advance
AI Analysis

This addresses the challenge of effective collaboration in multi-agent systems, particularly for tasks that can be divided into sub-tasks requiring coordinated sub-groups, though it appears incremental as it extends single-agent option discovery to multi-agent settings.

The paper tackles the problem of accelerating exploration in multi-agent reinforcement learning with sparse rewards by discovering collaborative options that coordinate multiple agents to visit under-explored regions of their joint state space, resulting in significantly faster exploration and higher task rewards compared to prior methods using single-agent options or no options.

The use of skills (a.k.a., options) can greatly accelerate exploration in reinforcement learning, especially when only sparse reward signals are available. While option discovery methods have been proposed for individual agents, in multi-agent reinforcement learning settings, discovering collaborative options that can coordinate the behavior of multiple agents and encourage them to visit the under-explored regions of their joint state space has not been considered. In this case, we propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space. Also, we propose a novel framework to adopt the multi-agent options in the MARL process. In practice, a multi-agent task can usually be divided into some sub-tasks, each of which can be completed by a sub-group of the agents. Therefore, our algorithm framework first leverages an attention mechanism to find collaborative agent sub-groups that would benefit most from coordinated actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn the multi-agent options for each sub-group to complete their sub-tasks first, and then to integrate them through a high-level policy as the solution of the whole task. This hierarchical option construction allows our framework to strike a balance between scalability and effective collaboration among the agents. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher task rewards.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes