LGAIJan 23, 2025

Concurrent Learning with Aggregated States via Randomized Least Squares Value Iteration

arXiv:2501.13394v3h-index: 13
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in concurrent learning for multi-agent reinforcement learning, showing incremental improvements in space complexity.

The paper tackles the problem of whether randomization can help multiple agents explore concurrently in reinforcement learning, demonstrating polynomial worst-case regret bounds with per-agent regret decreasing at an optimal rate of Θ(1/√N) in finite- and infinite-horizon environments.

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents {\it concurently} explore an environment. The theoretical results %that we established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to \textit{randomized least-squares value iteration} (RLSVI) with \textit{aggregated state representation}. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of $Θ\left(\frac{1}{\sqrt{N}}\right)$, highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to \cite{russo2019worst} and \cite{agrawal2021improved}. We reduce the space complexity by a factor of $K$ while incurring only a $\sqrt{K}$ increase in the worst-case regret bound, compared to \citep{agrawal2021improved,russo2019worst}. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes