Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems
This work is significant for industrial control engineers and researchers, as it offers an incremental improvement in RL applicability to highly constrained industrial systems, specifically microgrids.
This paper addresses the challenge of applying reinforcement learning (RL) to industrial control systems with highly constrained action spaces, which often involve both continuous and discrete control. The authors propose a novel RL algorithm featuring distance-based Q-value update schemes (incentive and penalty updates) and a shadow price-weighted penalty cost. When applied to microgrid system operation, the algorithm demonstrates superior performance.
Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority.