MALGFeb 5, 2025

Optimistic ε-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

arXiv:2502.03506v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in multi-agent reinforcement learning for cooperative tasks, but it is incremental as it builds on existing CTDE methods.

The paper tackled the problem of suboptimal solutions in cooperative multi-agent reinforcement learning due to underestimation of optimal actions in the CTDE paradigm, and the result was that Optimistic ε-Greedy Exploration significantly improved performance compared to other algorithms in various environments.

The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, due to the representational limitations of traditional monotonic value decomposition methods, algorithms can underestimate optimal actions, leading policies to suboptimal solutions. To address this challenge, we propose Optimistic $ε$-Greedy Exploration, focusing on enhancing exploration to correct value estimations. The underestimation arises from insufficient sampling of optimal actions during exploration, as our analysis indicated. We introduce an optimistic updating network to identify optimal actions and sample actions from its distribution with a probability of $ε$ during exploration, increasing the selection frequency of optimal actions. Experimental results in various environments reveal that the Optimistic $ε$-Greedy Exploration effectively prevents the algorithm from suboptimal solutions and significantly improves its performance compared to other algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes