Scalable Reinforcement Learning for Multi-Agent Networked Systems
This addresses the scalability problem for large networked systems in domains like communication and traffic, though it is incremental as it builds on existing RL methods with network structure exploitation.
The paper tackles the intractable scaling of reinforcement learning for multi-agent networked systems by proposing a Scalable Actor Critic framework that achieves an O(ρ^κ)-approximation of a stationary point with complexity scaling locally, as demonstrated in wireless communication, epidemics, and traffic examples.
We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor Critic (SAC) framework that exploits the network structure and finds a localized policy that is an $O(ρ^κ)$-approximation of a stationary point of the objective for some $ρ\in(0,1)$, with complexity that scales with the local state-action space size of the largest $κ$-hop neighborhood of the network. We illustrate our model and approach using examples from wireless communication, epidemics and traffic.