MA LGMar 13, 2019

Resource Abstraction for Reinforcement Learning in Multiagent Congestion Problems

Kleanthis Malialis, Sam Devlin, Daniel Kudenko

arXiv:1903.05431v15.928 citations

Originality Incremental advance

AI Analysis

This addresses deployability issues in real-world congestion problems like traffic management for autonomous systems, though it is incremental as it builds on existing reward shaping methods.

The paper tackled the challenges of learning time, scalability, and decentralized coordination in multiagent reinforcement learning for congestion problems by introducing Resource Abstraction, which groups resources to create more informative reward functions, resulting in significantly improved learning speed and scalability and achieving the highest or near-highest joint performance in large-scale scenarios with up to 1000 agents.

Real-world congestion problems (e.g. traffic congestion) are typically very complex and large-scale. Multiagent reinforcement learning (MARL) is a promising candidate for dealing with this emerging complexity by providing an autonomous and distributed solution to these problems. However, there are three limiting factors that affect the deployability of MARL approaches to congestion problems. These are learning time, scalability and decentralised coordination i.e. no communication between the learning agents. In this paper we introduce Resource Abstraction, an approach that addresses these challenges by allocating the available resources into abstract groups. This abstraction creates new reward functions that provide a more informative signal to the learning agents and aid the coordination amongst them. Experimental work is conducted on two benchmark domains from the literature, an abstract congestion problem and a realistic traffic congestion problem. The current state-of-the-art for solving multiagent congestion problems is a form of reward shaping called difference rewards. We show that the system using Resource Abstraction significantly improves the learning speed and scalability, and achieves the highest possible or near-highest joint performance/social welfare for both congestion problems in large-scale scenarios involving up to 1000 reinforcement learning agents.

View on arXiv PDF

Similar