LGOCJun 8, 2024

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

arXiv:2406.05366v21 citations
AI Analysis

This work addresses a fundamental problem in risk-sensitive optimal control, providing theoretical guarantees for online learning in control systems, though it is incremental as it extends regret analysis to a risk-sensitive variant.

The paper tackles the online adaptive control of risk-sensitive linear quadratic regulator in episodic settings, proposing algorithms that achieve $\widetilde{\mathcal{O}}(\log N)$ regret under an identifiability assumption and $\widetilde{\mathcal{O}}(\sqrt{N})$ regret otherwise, marking the first regret bounds for this problem.

Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $\widetilde{\mathcal{O}}(\log N)$ regret under a specific identifiability assumption, where $N$ is the total number of episodes. If the identifiability assumption is not satisfied, we propose incorporating exploration noise into the least-squares-based algorithm, resulting in an algorithm with $\widetilde{\mathcal{O}}(\sqrt{N})$ regret. To our best knowledge, this is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator. Our proof relies on perturbation analysis of less-standard Riccati equations for risk-sensitive linear quadratic control, and a delicate analysis of the loss in the risk-sensitive performance criterion due to applying the suboptimal controller in the online learning process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes