Learning Reward Machines in Cooperative Multi-Agent Tasks
This addresses the challenge of non-Markovian rewards and policy interpretability in multi-agent systems, though it appears incremental as it builds on existing reward machine and MARL techniques.
The paper tackled the problem of cooperative multi-agent reinforcement learning in partially observable environments by combining task decomposition with learning reward machines to encode sub-task structures, resulting in reduced complexity and more effective learning.
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete the cooperative task. The RMs associated with each sub-task are learnt in a decentralised manner and then used to guide the behaviour of each agent. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in MARL, especially in complex environments with large state spaces and multiple agents.