SYMay 12, 2022
Contingency-constrained economic dispatch with safe reinforcement learningMichael Eichelbeck, Hannah Markgraf, Matthias Althoff
Future power systems will rely heavily on micro grids with a high share of decentralised renewable energy sources and energy storage systems. The high complexity and uncertainty in this context might make conventional power dispatch strategies infeasible. Reinforcement-learning based (RL) controllers can address this challenge, however, cannot themselves provide safety guarantees, preventing their deployment in practice. To overcome this limitation, we propose a formally validated RL controller for economic dispatch. We extend conventional constraints by a time-dependent constraint encoding the islanding contingency. The contingency constraint is computed using set-based backwards reachability analysis and actions of the RL agent are verified through a safety layer. Unsafe actions are projected into the safe action space while leveraging constrained zonotope set representations for computational efficiency. The developed approach is demonstrated on a residential use case using real-world measurements.
LGSep 15, 2024
Predicting building types and functions at transnational scaleJonas Fill, Michael Eichelbeck, Michael Ebner
Building-specific knowledge such as building type and function information is important for numerous energy applications. However, comprehensive datasets containing this information for individual households are missing in many regions of Europe. For the first time, we investigate whether it is feasible to predict building types and functional classes at a European scale based on only open GIS datasets available across countries. We train a graph neural network (GNN) classifier on a large-scale graph dataset consisting of OpenStreetMap (OSM) buildings across the EU, Norway, Switzerland, and the UK. To efficiently perform training using the large-scale graph, we utilize localized subgraphs. A graph transformer model achieves a high Cohen's kappa coefficient of 0.754 when classifying buildings into 9 classes, and a very high Cohen's kappa coefficient of 0.844 when classifying buildings into the residential and non-residential classes. The experimental results imply three core novel contributions to literature. Firstly, we show that building classification across multiple countries is possible using a multi-source dataset consisting of information about 2D building shape, land use, degree of urbanization, and countries as input, and OSM tags as ground truth. Secondly, our results indicate that GNN models that consider contextual information about building neighborhoods improve predictive performance compared to models that only consider individual buildings and ignore the neighborhood. Thirdly, we show that training with GNNs on localized subgraphs instead of standard GNNs improves performance for the task of building classification.
LGMay 22, 2025Code
PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning ProjectsHannah Markgraf, Michael Eichelbeck, Daria Cappey et al.
Offline reinforcement learning (RL) has gained traction as a powerful paradigm for learning control policies from pre-collected data, eliminating the need for costly or risky online interactions. While many open-source libraries offer robust implementations of offline RL algorithms, they all rely on datasets composed of experience tuples consisting of state, action, next state, and reward. Managing, curating, and distributing such datasets requires suitable infrastructure. Although static datasets exist for established benchmark problems, no standardized or scalable solution supports developing and sharing datasets for novel or user-defined benchmarks. To address this gap, we introduce PyTupli, a Python-based tool to streamline the creation, storage, and dissemination of benchmark environments and their corresponding tuple datasets. PyTupli includes a lightweight client library with defined interfaces for uploading and retrieving benchmarks and data. It supports fine-grained filtering at both the episode and tuple level, allowing researchers to curate high-quality, task-specific datasets. A containerized server component enables production-ready deployment with authentication, access control, and automated certificate provisioning for secure use. By addressing key barriers in dataset infrastructure, PyTupli facilitates more collaborative, reproducible, and scalable offline RL research.
36.3SYApr 13
To Learn or Not to Learn: A Litmus Test for Using Reinforcement Learning in ControlVictor Schulte, Michael Eichelbeck, Matthias Althoff
Reinforcement learning (RL) can be a powerful alternative to classical control methods when standard model-based control is insufficient, e.g., when deriving a suitable model is intractable or impossible. In many cases, however, the choice between model-based and RL-based control is not obvious. Due to the high computational costs of training RL agents, RL-based control should be limited to cases where it is expected to yield superior results compared to model-based control. To the best of our knowledge, there exists no approach to quantify the benefit of RL-based control that does not require RL training. In this work, we present a computationally efficient, purely simulation-based litmus test predicting whether RL-based control is superior to model-based control. Our test evaluates the suitability of the given model for model-based control by analyzing the impact of model uncertainties on the control problem. For this, we use reachset-conformant model identification combined with simulation-based analysis. This is followed by a learnability evaluation of the uncertainties based on correlation analysis. This two-part analysis enables an informed decision on the suitability of RL for a control problem without training an RL agent. We apply our test to several benchmarks, demonstrating its applicability to a wide range of control problems and highlight the potential to save computational resources.
LGJan 2
BSAT: B-Spline Adaptive Tokenizer for Long-Term Time Series ForecastingMaximilian Reinwardt, Michael Eichelbeck, Matthias Althoff
Long-term time series forecasting using transformers is hampered by the quadratic complexity of self-attention and the rigidity of uniform patching, which may be misaligned with the data's semantic structure. In this paper, we introduce the \textit{B-Spline Adaptive Tokenizer (BSAT)}, a novel, parameter-free method that adaptively segments a time series by fitting it with B-splines. BSAT algorithmically places tokens in high-curvature regions and represents each variable-length basis function as a fixed-size token, composed of its coefficient and position. Further, we propose a hybrid positional encoding that combines a additive learnable positional encoding with Rotary Positional Embedding featuring a layer-wise learnable base: L-RoPE. This allows each layer to attend to different temporal dependencies. Our experiments on several public benchmarks show that our model is competitive with strong performance at high compression rates. This makes it particularly well-suited for use cases with strong memory constraints.
LGApr 23, 2024
Formal Verification of Graph Convolutional Networks with Uncertain Node Features and Uncertain Graph StructureTobias Ladner, Michael Eichelbeck, Matthias Althoff
Graph neural networks are becoming increasingly popular in the field of machine learning due to their unique ability to process data structured in graphs. They have also been applied in safety-critical environments where perturbations inherently occur. However, these perturbations require us to formally verify neural networks before their deployment in safety-critical environments as neural networks are prone to adversarial attacks. While there exists research on the formal verification of neural networks, there is no work verifying the robustness of generic graph convolutional network architectures with uncertainty in the node features and in the graph structure over multiple message-passing steps. This work addresses this research gap by explicitly preserving the non-convex dependencies of all elements in the underlying computations through reachability analysis with (matrix) polynomial zonotopes. We demonstrate our approach on three popular benchmark datasets.
LGNov 27, 2025
Improving Stochastic Action-Constrained Reinforcement Learning via Truncated DistributionsRoland Stolz, Michael Eichelbeck, Matthias Althoff
In reinforcement learning (RL), it is often advantageous to consider additional constraints on the action space to ensure safety or action relevance. Existing work on such action-constrained RL faces challenges regarding effective policy updates, computational efficiency, and predictable runtime. Recent work proposes to use truncated normal distributions for stochastic policy gradient methods. However, the computation of key characteristics, such as the entropy, log-probability, and their gradients, becomes intractable under complex constraints. Hence, prior work approximates these using the non-truncated distributions, which severely degrades performance. We argue that accurate estimation of these characteristics is crucial in the action-constrained RL setting, and propose efficient numerical approximations for them. We also provide an efficient sampling strategy for truncated policy distributions and validate our approach on three benchmark environments, which demonstrate significant performance improvements when using accurate estimations.
LGAug 14, 2025
Zono-Conformal Prediction: Zonotope-Based Uncertainty Quantification for Regression and Classification TasksLaura Lützow, Michael Eichelbeck, Mykel J. Kochenderfer et al.
Conformal prediction is a popular uncertainty quantification method that augments a base predictor with prediction sets with statistically valid coverage guarantees. However, current methods are often computationally expensive and data-intensive, as they require constructing an uncertainty model before calibration. Moreover, existing approaches typically represent the prediction sets with intervals, which limits their ability to capture dependencies in multi-dimensional outputs. We address these limitations by introducing zono-conformal prediction, a novel approach inspired by interval predictor models and reachset-conformant identification that constructs prediction zonotopes with assured coverage. By placing zonotopic uncertainty sets directly into the model of the base predictor, zono-conformal predictors can be identified via a single, data-efficient linear program. While we can apply zono-conformal prediction to arbitrary nonlinear base predictors, we focus on feed-forward neural networks in this work. Aside from regression tasks, we also construct optimal zono-conformal predictors in classification settings where the output of an uncertain predictor is a set of possible classes. We provide probabilistic coverage guarantees and present methods for detecting outliers in the identification data. In extensive numerical experiments, we show that zono-conformal predictors are less conservative than interval predictor models and standard conformal prediction methods, while achieving a similar coverage over the test data.
LGJun 6, 2024
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action MaskingRoland Stolz, Hanna Krasowski, Jakob Thumm et al.
Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using proximal policy optimization (PPO), we evaluate our methods on four control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
SYJun 5, 2024
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlMichael Eichelbeck, Hannah Markgraf, Matthias Althoff
The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). To validate their effectiveness, RL algorithms have to be evaluated across multiple case studies. Case study design is an arduous task requiring the consideration of many aspects, among them the influence of available forecasts and the level of decentralization in the control structure. Furthermore, vanilla RL controllers cannot themselves ensure the satisfaction of system constraints, which makes devising a safeguarding mechanism a necessary task for every case study before deploying the system. To address these shortcomings, we introduce the Python tool CommonPower, the first general framework for the modeling and simulation of power system management tailored towards machine learning. Its modular architecture enables users to focus on specific elements without having to implement a simulation environment. Another unique contribution of CommonPower is the automatic synthesis of model predictive controllers and safeguards. Beyond offering a unified interface for single-agent RL, multi-agent RL, and optimal control, CommonPower includes a training pipeline for machine-learning-based forecasters as well as a flexible mechanism for incorporating feedback of safeguards into the learning updates of RL controllers.