AI GT MAOct 17, 2019

MAPEL: Multi-Agent Pursuer-Evader Learning using Situation Report

arXiv:1910.07780v12.0Has Code

Originality Incremental advance

AI Analysis

This addresses cooperation challenges in multi-agent systems for robotics or gaming, but it is incremental as it builds on existing decentralized learning approaches.

The paper tackles the problem of effective cooperation in a multi-agent territory guarding game with limited sensing, introducing MAPEL algorithms that use spatio-temporal graphs and situation reports for decentralized learning, resulting in empirical analysis of cooperation methods as agent numbers increase.

In this paper, we consider a territory guarding game involving pursuers, evaders and a target in an environment that contains obstacles. The goal of the evaders is to capture the target, while that of the pursuers is to capture the evaders before they reach the target. All the agents have limited sensing range and can only detect each other when they are in their observation space. We focus on the challenge of effective cooperation between agents of a team. Finding exact solutions for such multi-agent systems is difficult because of the inherent complexity. We present Multi-Agent Pursuer-Evader Learning (MAPEL), a class of algorithms that use spatio-temporal graph representation to learn structured cooperation. The key concept is that the learning takes place in a decentralized manner and agents use situation report updates to learn about the whole environment from each others' partial observations. We use Recurrent Neural Networks (RNNs) to parameterize the spatio-temporal graph. An agent in MAPEL only updates all the other agents if an opponent or the target is inside its observation space by using situation report. We present two methods for cooperation via situation report update: a) Peer-to-Peer Situation Report (P2PSR) and b) Ring Situation Report (RSR). We present a detailed analysis of how these two cooperation methods perform when the number of agents in the game are increased. We provide empirical results to show how agents cooperate under these two methods.

View on arXiv PDF Code

Similar