SYSESYMay 21

ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization Modeling

arXiv:2602.0307090.71 citationsh-index: 12
AI Analysis

For researchers and practitioners in power systems, this work provides a specialized benchmark to evaluate and improve LLMs for automated optimization modeling, addressing a critical bottleneck in handling renewable uncertainty.

The paper introduces ProOPF-D and ProOPF-B, a dataset and benchmark for professional-grade Optimal Power Flow (OPF) modeling, containing 12K training instances and 121 expert-annotated test cases. The benchmark enables evaluation of LLMs on translating natural-language operational requirements into executable optimization models, addressing the lack of rigorous evaluation in power-system settings.

Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes