Augmented Lagrangian-Based Safe Reinforcement Learning Approach for Distribution System Volt/VAR Control
This addresses the challenge of inaccurate distribution system models for grid operators, offering a scalable, data-driven solution, though it is incremental as it builds on existing RL and optimization techniques.
The paper tackles the Volt-VAR control problem in active distribution systems by formulating it as a constrained Markov decision process and proposing a safe off-policy reinforcement learning approach that combines augmented Lagrangian and soft actor critic methods, achieving high solution optimality and constraints compliance in numerical experiments with real-world electricity data.
This paper proposes a data-driven solution for Volt-VAR control problem in active distribution system. As distribution system models are always inaccurate and incomplete, it is quite difficult to solve the problem. To handle with this dilemma, this paper formulates the Volt-VAR control problem as a constrained Markov decision process (CMDP). By synergistically combining the augmented Lagrangian method and soft actor critic algorithm, a novel safe off-policy reinforcement learning (RL) approach is proposed in this paper to solve the CMDP. The actor network is updated in a policy gradient manner with the Lagrangian value function. A double-critics network is adopted to synchronously estimate the action-value function to avoid overestimation bias. The proposed algorithm does not require strong convexity guarantee of examined problems and is sample efficient. A two-stage strategy is adopted for offline training and online execution, so the accurate distribution system model is no longer needed. To achieve scalability, a centralized training distributed execution strategy is adopted for a multi-agent framework, which enables a decentralized Volt-VAR control for large-scale distribution system. Comprehensive numerical experiments with real-world electricity data demonstrate that our proposed algorithm can achieve high solution optimality and constraints compliance.