SY AI LG OCApr 20, 2021

Model-predictive control and reinforcement learning in multi-energy system case studies

Glenn Ceusters, Román Cantú Rodríguez, Alberte Bouso García, Rüdiger Franke, Geert Deconinck, Lieve Helsen, Ann Nowé, Maarten Messagie, Luis Ramirez Camargo

arXiv:2104.09785v29.2119 citations

Originality Incremental advance

AI Analysis

This addresses optimal control for multi-energy systems, offering a potentially adaptive alternative to model-based methods, but it is incremental as it builds on existing RL and MPC techniques.

The paper tackled the problem of minimizing operation costs in multi-energy systems by comparing reinforcement learning (RL) with model-predictive control (MPC), showing that RL can match or outperform MPC benchmarks, with RL achieving 101.5% performance vs. perfect foresight MPC and 94.6% vs. 88.9% for realistic MPC in complex cases.

Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing project-specific engineering cost. In this paper, we present an on- and off-policy multi-objective reinforcement learning (RL) approach, that does not assume a model a priori, benchmarking this against a linear MPC (LMPC - to reflect current practice, though non-linear MPC performs better) - both derived from the general optimal control problem, highlighting their differences and similarities. In a simple multi-energy system (MES) configuration case study, we show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers potential to match and outperform the perfect foresight LMPC benchmark (101.5%). This while the realistic LMPC, i.e. imperfect predictions, only achieves 98%. While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%). In both case studies, the RL agents outperformed the realistic LMPC after a training period of 2 years using quarterly interactions with the environment. We conclude that reinforcement learning is a viable optimal control technique for multi-energy systems given adequate constraint handling and pre-training, to avoid unsafe interactions and long training periods, as is proposed in fundamental future work.

View on arXiv PDF

Similar