LG OC MLJun 10, 2020

Making Non-Stochastic Control (Almost) as Easy as Stochastic

arXiv:2006.05910v215.041 citations

Originality Highly original

AI Analysis

This work addresses the challenge of robust control for systems with arbitrary adversarial disturbances, making it relevant for applications where noise is non-stochastic, and it is incremental by extending stochastic results to a more general setting.

The paper tackles the problem of online control in linear dynamical systems with adversarial noise, showing that optimal regret scaling of O(√T) is achievable even without stochastic assumptions, matching the known rate for stochastic settings. It achieves this with unknown dynamics and poly(log T) regret when dynamics are known, under strongly convex costs.

Recent literature has made much progress in understanding \emph{online LQR}: a modern learning-theoretic take on the classical control problem in which a learner attempts to optimally control an unknown linear dynamical system with fully observed state, perturbed by i.i.d. Gaussian noise. It is now understood that the optimal regret on time horizon $T$ against the optimal control law scales as $\widetildeΘ(\sqrt{T})$. In this paper, we show that the same regret rate (against a suitable benchmark) is attainable even in the considerably more general non-stochastic control model, where the system is driven by \emph{arbitrary adversarial} noise (Agarwal et al. 2019). In other words, \emph{stochasticity confers little benefit in online LQR}. We attain the optimal $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the dynamics are unknown to the learner, and $\mathrm{poly}(\log T)$ regret when known, provided that the cost functions are strongly convex (as in LQR). Our algorithm is based on a novel variant of online Newton step (Hazan et al. 2007), which adapts to the geometry induced by possibly adversarial disturbances, and our analysis hinges on generic "policy regret" bounds for certain structured losses in the OCO-with-memory framework (Anava et al. 2015). Moreover, our results accomodate the full generality of the non-stochastic control setting: adversarially chosen (possibly non-quadratic) costs, partial state observation, and fully adversarial process and observation noise.

View on arXiv PDF

Similar