AIMADec 29, 2019

Individual specialization in multi-task environments with multiagent reinforcement learners

arXiv:1912.12671v1
Originality Incremental advance
AI Analysis

This addresses coordination challenges for researchers and practitioners in multi-agent systems, but it is incremental as it builds on existing MARL work.

The paper tackled the problem of coordination in multi-task multi-agent reinforcement learning environments, where agents can specialize in different tasks, and found that using policy-based methods with independent entropy regularization improved convergence compared to epsilon-greedy exploration, with specialization becoming more probable as the number of agents increased.

There is a growing interest in Multi-Agent Reinforcement Learning (MARL) as the first steps towards building general intelligent agents that learn to make low and high-level decisions in non-stationary complex environments in the presence of other agents. Previous results point us towards increased conditions for coordination, efficiency/fairness, and common-pool resource sharing. We further study coordination in multi-task environments where several rewarding tasks can be performed and thus agents don't necessarily need to perform well in all tasks, but under certain conditions may specialize. An observation derived from the study is that epsilon greedy exploration of value-based reinforcement learning methods is not adequate for multi-agent independent learners because the epsilon parameter that controls the probability of selecting a random action synchronizes the agents artificially and forces them to have deterministic policies at the same time. By using policy-based methods with independent entropy regularised exploration updates, we achieved a better and smoother convergence. Another result that needs to be further investigated is that with an increased number of agents specialization tends to be more probable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes