LG SYApr 6, 2021

MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage

Arash Bahari Kordabad, Wenqi Cai, Sebastien Gros

arXiv:2104.02411v15.531 citations

Originality Incremental advance

AI Analysis

This work addresses a specific issue in reinforcement learning for economic control problems, offering an incremental improvement for domains like battery storage optimization.

The paper tackles the challenge of optimizing Model Predictive Control (MPC) policies using reinforcement learning for economic problems with bang-bang structures, where policy gradients struggle, by proposing a homotopy strategy based on the interior-point method, resulting in more homogeneous and faster learning compared to classical approaches in a battery storage application.

In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach.

View on arXiv PDF

Similar