Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning
This addresses exploration inefficiencies in collaborative multi-agent systems, but it is incremental as it builds on existing intrinsic reward methods.
The paper tackles the problem of unreliable intrinsic rewards hindering optimal policy learning in multi-agent reinforcement learning, and proposes a framework called Independent Centrally-assisted Q-learning (ICQL) that uses a centralized agent with intrinsic rewards to improve exploration for decentralized agents.
This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning. We discuss the challenges in applying intrinsic reward to multiple collaborative agents and demonstrate how unreliable reward can prevent decentralized agents from learning the optimal policy. We address this problem with a novel framework, Independent Centrally-assisted Q-learning (ICQL), in which decentralized agents share control and an experience replay buffer with a centralized agent. Only the centralized agent is intrinsically rewarded, but the decentralized agents still benefit from improved exploration, without the distraction of unreliable incentives.