OCLGDec 19, 2019

Learning Convex Optimization Control Policies

arXiv:1912.09529v186 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of automating parameter tuning for control policies in applications like LQR and MPC, which is incremental as it builds on existing gradient evaluation methods.

The paper tackles the problem of manually tuning parameters in convex optimization control policies (COCPs) by proposing an automated method that uses approximate gradients of performance metrics with respect to parameters, resulting in improved efficiency in parameter adjustment.

Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex control-Lyapunov or approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a crude grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex optimization problem with respect to its parameters. We illustrate our method on several examples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes