Scalable regret for learning to control network-coupled subsystems with unknown dynamics
This work addresses the challenge of scalable control for networked systems with unknown dynamics, offering a solution that improves regret scaling from super-linear to linear, which is incremental but practically relevant for large-scale applications.
The paper tackles the problem of controlling an unknown linear quadratic Gaussian system with multiple network-coupled subsystems, aiming to minimize regret relative to an oracle. It proposes a Thompson sampling algorithm that exploits network structure, achieving a regret bound of O~(n√T), scaling linearly with the number of subsystems, as validated by numerical experiments.
We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.