SY LGDec 3, 2024

Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control

Sebastian Hirt, Lukas Theiner, Rolf Findeisen

arXiv:2412.02423v22.32 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the problem of resource-efficient controller tuning for control systems, but it is incremental as it builds on existing Bayesian optimization methods.

The paper tackled the problem of slow convergence and inefficient resource usage in tuning controller parameters for sequential decision making by proposing a time-series-informed multi-fidelity Bayesian optimization framework that incorporates intermediate performance evaluations and early stopping. The result showed that, compared to standard methods, it achieved comparable performance with roughly half the experimental resources and better final performance with the same budget.

Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the choice of controller parameters. Bayesian optimization allows learning of parameters from closed-loop experiments, but standard Bayesian optimization treats this as a black-box problem and ignores the temporal structure of closed-loop trajectories, leading to slow convergence and inefficient use of experimental resources. We propose a time-series-informed multi-fidelity Bayesian optimization framework that aligns the fidelity dimension with closed-loop time, enabling intermediate performance evaluations within a closed-loop experiment to be incorporated as lower-fidelity observations. Additionally, we derive probabilistic early stopping criteria to terminate unpromising closed-loop experiments based on the surrogate model's posterior belief, avoiding full episodes for poor parameterizations and thereby reducing resource usage. Simulation results on a nonlinear control benchmark demonstrate that, compared to standard black-box Bayesian optimization approaches, the proposed method achieves comparable closed-loop performance with roughly half the experimental resources, and yields better final performance when using the same resource budget, highlighting the value of exploiting temporal structure for sample-efficient closed-loop controller tuning.

View on arXiv PDF

Similar