Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids

arXiv:2604.0183044.0h-index: 4

AI Analysis

This work addresses the problem of efficient and effective topology control for power grid operators, offering a domain-specific incremental improvement by integrating physics constraints into reinforcement learning.

The paper tackles the challenging sequential decision-making problem of topology control in power grids by proposing a physics-informed reinforcement learning framework with Gibbs priors, achieving results such as matching oracle-level performance while being approximately 6x faster on one benchmark and improving over a PPO baseline by up to 255% in reward on another.

Topology control for power grid operation is a challenging sequential decision making problem because the action space grows combinatorially with the size of the grid and action evaluation through simulation is computationally expensive. We propose a physics-informed Reinforcement Learning framework that combines semi-Markov control with a Gibbs prior, that encodes the system's physics, over the action space. The decision is only taken when the grid enters a hazardous regime, while a graph neural network surrogate predicts the post action overload risk of feasible topology actions. These predictions are used to construct a physics-informed Gibbs prior that both selects a small state-dependent candidate set and reweights policy logits before action selection. In this way, our method reduces exploration difficulty and online simulation cost while preserving the flexibility of a learned policy. We evaluate the approach in three realistic benchmark environments of increasing difficulty. Across all settings, the proposed method achieves a strong balance between control quality and computational efficiency: it matches oracle-level performance while being approximately $6\times$ faster on the first benchmark, reaches $94.6\%$ of oracle reward with roughly $200\times$ lower decision time on the second one, and on the most challenging benchmark improves over a PPO baseline by up to $255\%$ in reward and $284\%$ in survived steps while remaining about $2.5\times$ faster than a strong specialized engineering baseline. These results show that our method provides an effective mechanism for topology control in power grids.

View on arXiv PDF

Similar