LGMANov 18, 2022

Credit-cognisant reinforcement learning for multi-agent cooperation

arXiv:2211.10100v1h-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of delayed rewards and credit assignment in multi-agent cooperation, offering an incremental improvement for scenarios like cooperative games.

The paper tackles the problem of multi-agent reinforcement learning in partially observable environments by introducing credit-cognisant rewards (CCRs) to improve credit assignment among agents, resulting in significant performance gains over independent and recurrent deep Q-learning methods in a simplified Hanabi game.

Traditional multi-agent reinforcement learning (MARL) algorithms, such as independent Q-learning, struggle when presented with partially observable scenarios, and where agents are required to develop delicate action sequences. This is often the result of the reward for a good action only being available after other agents have taken theirs, and these actions are not credited accordingly. Recurrent neural networks have proven to be a viable solution strategy for solving these types of problems, resulting in significant performance increase when compared to other methods. In this paper, we explore a different approach and focus on the experiences used to update the action-value functions of each agent. We introduce the concept of credit-cognisant rewards (CCRs), which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents. We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning as well as deep recurrent Q-learning. We evaluate and test the performance of CCRs when applied to deep reinforcement learning techniques at the hands of a simplified version of the popular card game Hanabi.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes