LGMAMar 1, 2024

Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale

arXiv:2403.00222v310 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses a long-standing scalability problem in multi-agent reinforcement learning for applications like demand response and EV charging, though it is an incremental improvement on existing methods.

The paper tackles the scalability challenge in reinforcement learning for global decision-making with many local agents by proposing the SUBSAMPLE-Q algorithm, which subsamples agents to achieve polynomial-time policy computation and converges to optimality with error bounds like O~(1/√k + ε_{k,m}), validated in simulations for demand response and queueing.

We study reinforcement learning for global decision-making in the presence of local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the joint rewards of all the agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has been a long-standing challenge due to the size of the state space which can be exponential in the number of agents. This work proposes the \texttt{SUBSAMPLE-Q} algorithm where the global agent subsamples $k\leq n$ local agents to compute a policy in time that is polynomial in $k$. We show that this learned policy converges to the optimal policy in the order of $\tilde{O}(1/\sqrt{k}+ε_{k,m})$ as the number of sub-sampled agents $k$ increases, where $ε_{k,m}$ is the Bellman noise. Finally, we validate the theory through numerical simulations in a demand-response setting and a queueing setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes