Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret
This solves a long-standing open problem in control theory for researchers and practitioners, providing an efficient method with optimal regret bounds.
The authors tackled the problem of learning in Linear Quadratic Control systems with unknown dynamics, achieving the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret, which resolves an open question from prior work.
We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesvári (2011) and Dean, Mania, Matni, Recht, and Tu (2018).