AIAug 7, 2014

Learning to Cooperate via Policy Search

Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Pack Kaelbling

arXiv:1408.1484v1310 citations

Originality Incremental advance

AI Analysis

This work addresses cooperative multi-agent learning for partially observable environments, but it is incremental as it builds on existing policy search methods.

The paper tackles learning in cooperative games under partial observability by proposing a gradient-based distributed policy-search method, demonstrating its effectiveness in a simulated soccer domain.

Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.

View on arXiv PDF

Similar