Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale
This addresses a long-standing scalability problem in multi-agent reinforcement learning for applications like demand response and EV charging, though it is an incremental improvement on existing methods.
The paper tackles the scalability challenge in reinforcement learning for global decision-making with many local agents by proposing the SUBSAMPLE-Q algorithm, which subsamples agents to achieve polynomial-time policy computation and converges to optimality with error bounds like O~(1/√k + ε_{k,m}), validated in simulations for demand response and queueing.
We study reinforcement learning for global decision-making in the presence of local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the joint rewards of all the agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has been a long-standing challenge due to the size of the state space which can be exponential in the number of agents. This work proposes the \texttt{SUBSAMPLE-Q} algorithm where the global agent subsamples $k\leq n$ local agents to compute a policy in time that is polynomial in $k$. We show that this learned policy converges to the optimal policy in the order of $\tilde{O}(1/\sqrt{k}+ε_{k,m})$ as the number of sub-sampled agents $k$ increases, where $ε_{k,m}$ is the Bellman noise. Finally, we validate the theory through numerical simulations in a demand-response setting and a queueing setting.