LG DS OC MLJun 18, 2022

Optimal Dynamic Regret in LQR Control

arXiv:2206.09257v19.618 citationsh-index: 9

Originality Incremental advance

AI Analysis

This provides an optimal solution for LQR control under nonstationary disturbances, which is incremental as it builds on prior reductions and methods.

The paper tackles the problem of nonstochastic control in LQR with quadratic losses by developing an efficient online algorithm that achieves an optimal dynamic regret of $ ilde{O}( ext{max}\{n^{1/3} \mathcal{TV}(M_{1:n})^{2/3}, 1\})$, improving the previous best rate of $ ilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ and proving information-theoretic optimality.

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of $\tilde{O}(\text{max}\{n^{1/3} \mathcal{TV}(M_{1:n})^{2/3}, 1\})$, where $\mathcal{TV}(M_{1:n})$ is the total variation of any oracle sequence of Disturbance Action policies parameterized by $M_1,...,M_n$ -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of $\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal $\tilde{O}(n^{1/3})$ dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.

View on arXiv PDF

Similar