Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration
This addresses the challenge of efficient control in unknown systems for applications like robotics or autonomous systems, but it is incremental as it builds on existing LQG and exploration methods.
The paper tackles the problem of controlling unknown partially observable systems in the Linear Quadratic Gaussian (LQG) framework by analyzing regret with a computationally efficient exploration strategy, achieving a regret growth rate of O(√T) up to logarithmic factors for the LQG-NAIVE algorithm and proposing an extended LQG-IF2E method with competitive performance.
In this paper, we analyze the regret incurred by a computationally efficient exploration strategy, known as naive exploration, for controlling unknown partially observable systems within the Linear Quadratic Gaussian (LQG) framework. We introduce a two-phase control algorithm called LQG-NAIVE, which involves an initial phase of injecting Gaussian input signals to obtain a system model, followed by a second phase of an interplay between naive exploration and control in an episodic fashion. We show that LQG-NAIVE achieves a regret growth rate of $\tilde{\mathcal{O}}(\sqrt{T})$, i.e., $\mathcal{O}(\sqrt{T})$ up to logarithmic factors after $T$ time steps, and we validate its performance through numerical simulations. Additionally, we propose LQG-IF2E, which extends the exploration signal to a `closed-loop' setting by incorporating the Fisher Information Matrix (FIM). We provide compelling numerical evidence of the competitive performance of LQG-IF2E compared to LQG-NAIVE.