Learning the Optimal Power Flow: Environment Design Matters
This work addresses the challenge of inconsistent environment formulations in RL-OPF research, offering a benchmark for future studies in this domain-specific area.
The study investigated how different environment design decisions affect reinforcement learning (RL) performance for solving the optimal power flow (OPF) problem, showing significant impacts on training outcomes and providing initial recommendations for these choices.
To solve the optimal power flow (OPF) problem, reinforcement learning (RL) emerges as a promising new approach. However, the RL-OPF literature is strongly divided regarding the exact formulation of the OPF problem as an RL environment. In this work, we collect and implement diverse environment design decisions from the literature regarding training data, observation space, episode definition, and reward function choice. In an experimental analysis, we show the significant impact of these environment design options on RL-OPF training performance. Further, we derive some first recommendations regarding the choice of these design decisions. The created environment framework is fully open-source and can serve as a benchmark for future research in the RL-OPF field.