Robust exploration in linear quadratic reinforcement learning
This addresses robust policy learning in reinforcement learning for linear quadratic systems, offering improvements over existing methods but appearing incremental in nature.
The paper tackles the problem of learning control policies for unknown linear dynamical systems with quadratic costs by developing a convex optimization method that minimizes worst-case cost while accounting for system uncertainty. The approach balances exploitation and exploration to reduce uncertainty in sensitive model parameters, demonstrating appreciable performance and robustness gains in simulations and hardware applications.
This paper concerns the problem of learning control policies for an unknown linear dynamical system to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task robustly: i.e., we minimize the worst-case cost, accounting for system uncertainty given the observed data. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.