Timothy Verstraeten

LG
15papers
722citations
Novelty49%
AI Score46

15 Papers

78.0SYMay 7
Herd Behavior in Decentralized Balancing Models: A Case Study in Belgium

Max Bruninx, Seyed Soroush Karimi Madahi, Timothy Verstraeten et al.

In a decentralized balancing model, Balance Responsible Parties (BRPs) are encouraged by the Transmission System Operator (TSO) to deviate from their schedule to help the system restore balance, also referred to as implicit balancing. This could reduce balancing costs for the grid operator and lower the entry barrier for flexible assets compared to explicit balancing services. However, these implicit reactions may overshoot when their total capacity is high, potentially requiring more explicit activations. This study analyses the effect of increased participation in the decentralized balancing model in Belgium. To this end, we develop a market simulator that produces price signals on minute-level and simulate the implicit reactions for battery assets with different risk profiles. Besides the current price formula, we also study two potential candidates for the near-term presented by the TSO. A simulation study is conducted using Belgian market data for the year 2023. The findings indicate that, while having a significant positive effect on the balancing costs at first, the risk of overshoots can outweigh the potential benefits when the total capacity of the implicit reactions becomes too large. Furthermore, even when the balancing costs start to increase for the TSO, BRPs were still found to benefit from implicit balancing.

LGJan 30, 2023
Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration

Alexandra Cimpean, Timothy Verstraeten, Lander Willem et al.

Individual-based epidemiological models support the study of fine-grained preventive measures, such as tailored vaccine allocation policies, in silico. As individual-based models are computationally intensive, it is pivotal to identify optimal strategies within a reasonable computational budget. Moreover, due to the high societal impact associated with the implementation of preventive strategies, uncertainty regarding decisions should be communicated to policy makers, which is naturally embedded in a Bayesian approach. We present a novel technique for evaluating vaccine allocation strategies using a multi-armed bandit framework in combination with a Bayesian anytime $m$-top exploration algorithm. $m$-top exploration allows the algorithm to learn $m$ policies for which it expects the highest utility, enabling experts to inspect this small set of alternative strategies, along with their quantified uncertainty. The anytime component provides policy advisors with flexibility regarding the computation time and the desired confidence, which is important as it is difficult to make this trade-off beforehand. We consider the Belgian COVID-19 epidemic using the individual-based model STRIDE, where we learn a set of vaccination policies that minimize the number of infections and hospitalisations. Through experiments we show that our method can efficiently identify the $m$-top policies, which is validated in a scenario where the ground truth is available. Finally, we explore how vaccination policies can best be organised under different contact reduction schemes and we investigate the impact of vaccine uptake proportions (i.e., the proportion of individuals that will comply with the strategy and take the vaccine).

AIJul 1, 2022
Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow Models

Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers et al.

Many real-world problems contain multiple objectives and agents, where a trade-off exists between objectives. Key to solving such problems is to exploit sparse dependency structures that exist between agents. For example, in wind farm control a trade-off exists between maximising power and minimising stress on the systems components. Dependencies between turbines arise due to the wake effect. We model such sparse dependencies between agents as a multi-objective coordination graph (MO-CoG). In multi-objective reinforcement learning a utility function is typically used to model a users preferences over objectives, which may be unknown a priori. In such settings a set of optimal policies must be computed. Which policies are optimal depends on which optimality criterion applies. If the utility function of a user is derived from multiple executions of a policy, the scalarised expected returns (SER) must be optimised. If the utility of a user is derived from a single execution of a policy, the expected scalarised returns (ESR) criterion must be optimised. For example, wind farms are subjected to constraints and regulations that must be adhered to at all times, therefore the ESR criterion must be optimised. For MO-CoGs, the state-of-the-art algorithms can only compute a set of optimal policies for the SER criterion, leaving the ESR criterion understudied. To compute a set of optimal polices under the ESR criterion, also known as the ESR set, distributions over the returns must be maintained. Therefore, to compute a set of optimal policies under the ESR criterion for MO-CoGs, we present a novel distributional multi-objective variable elimination (DMOVE) algorithm. We evaluate DMOVE in realistic wind farm simulations. Given the returns in real-world wind farm settings are continuous, we utilise a model known as real-NVP to learn the continuous return distributions to calculate the ESR set.

LGFeb 13
Probabilistic Wind Power Forecasting with Tree-Based Machine Learning and Weather Ensembles

Max Bruninx, Diederik van Binsbergen, Timothy Verstraeten et al.

Accurate production forecasts are essential to continue facilitating the integration of renewable energy sources into the power grid. This paper illustrates how to obtain probabilistic day-ahead forecasts of wind power generation via gradient boosting trees using an ensemble of weather forecasts. To this end, we perform a comparative analysis across three state-of-the-art probabilistic prediction methods-conformalised quantile regression, natural gradient boosting and conditional diffusion models-all of which can be combined with tree-based machine learning. The methods are validated using four years of data for all wind farms present within the Belgian offshore zone. Additionally, the point forecasts are benchmarked against deterministic engineering methods, using either the power curve or an advanced approach incorporating a calibrated analytical wake model. The experimental results show that the machine learning methods improve the mean absolute error by up to 53% and 33% compared to the power curve and the calibrated wake model. Considering the three probabilistic prediction methods, the conditional diffusion model is found to yield the best overall probabilistic and point estimate of wind power generation. Moreover, the findings suggest that the use of an ensemble of weather forecasts can improve point forecast accuracy by up to 23%.

LGJun 2, 2021
Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers et al.

In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this paper we address this challenge by proposing first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also propose a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends first-order stochastic dominance to allow a set of optimal policies to be learned in practice. We then define a new solution concept called the ESR set, which is a set of policies that are ESR dominant. Finally, we define a new multi-objective distributional tabular reinforcement learning (MOT-DRL) algorithm to learn the ESR set in a multi-objective multi-armed bandit setting.

AIMar 17, 2021
A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi et al.

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

LGJan 19, 2021
Scalable Optimization for Wind Farm Control using Coordination Graphs

Timothy Verstraeten, Pieter-Jan Daems, Eugenio Bargiacchi et al.

Wind farms are a crucial driver toward the generation of ecological and renewable energy. Due to their rapid increase in capacity, contemporary wind farms need to adhere to strict constraints on power output to ensure stability of the electricity grid. Specifically, a wind farm controller is required to match the farm's power production with a power demand imposed by the grid operator. This is a non-trivial optimization problem, as complex dependencies exist between the wind turbines. State-of-the-art wind farm control typically relies on physics-based heuristics that fail to capture the full load spectrum that defines a turbine's health status. When this is not taken into account, the long-term viability of the farm's turbines is put at risk. Given the complex dependencies that determine a turbine's lifetime, learning a flexible and optimal control strategy requires a data-driven approach. However, as wind farms are large-scale multi-agent systems, optimizing control strategies over the full joint action space is intractable. We propose a new learning method for wind farm control that leverages the sparse wind farm structure to factorize the optimization problem. Using a Bayesian approach, based on multi-agent Thompson sampling, we explore the factored joint action space for configurations that match the demand, while considering the lifetime of turbines. We apply our method to a grid-like wind farm layout, and evaluate configurations using a state-of-the-art wind flow simulator. Our results are competitive with a physics-based heuristic approach in terms of demand error, while, contrary to the heuristic, our method prolongs the lifetime of high-risk turbines.

MANov 14, 2020
Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

Roxana Rădulescu, Timothy Verstraeten, Yijie Zhang et al.

Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.

LGMar 30, 2020
Deep reinforcement learning for large-scale epidemic control

Pieter Libin, Arno Moonens, Timothy Verstraeten et al.

Epidemics of infectious diseases are an important threat to public health and global economies. Yet, the development of prevention strategies remains a challenging process, as epidemics are non-linear and complex processes. For this reason, we investigate a deep reinforcement learning approach to automatically learn prevention strategies in the context of pandemic influenza. Firstly, we construct a new epidemiological meta-population model, with 379 patches (one for each administrative district in Great Britain), that adequately captures the infection process of pandemic influenza. Our model balances complexity and computational efficiency such that the use of reinforcement learning techniques becomes attainable. Secondly, we set up a ground truth such that we can evaluate the performance of the 'Proximal Policy Optimization' algorithm to learn in a single district of this epidemiological model. Finally, we consider a large-scale problem, by conducting an experiment where we aim to learn a joint policy to control the districts in a community of 11 tightly coupled districts, for which no ground truth can be established. This experiment shows that deep reinforcement learning can be used to learn mitigation policies in complex epidemiological models with a large state space. Moreover, through this experiment, we demonstrate that there can be an advantage to consider collaboration between districts when designing prevention strategies.

LGJan 15, 2020
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping

Eugenio Bargiacchi, Timothy Verstraeten, Diederik M. Roijers et al.

We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping, for efficient learning in multi-agent Markov decision processes. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our approach only requires knowledge about the structure of the problem in the form of a dynamic decision network. Using this information, our method learns a model of the environment and performs temporal difference updates which affect multiple joint states and actions at once. Batch updates are additionally performed which efficiently back-propagate knowledge throughout the factored Q-function. Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.

LGNov 22, 2019
Fleet Control using Coregionalized Gaussian Process Policy Iteration

Timothy Verstraeten, Pieter JK Libin, Ann Nowé

In many settings, as for example wind farms, multiple machines are instantiated to perform the same task, which is called a fleet. The recent advances with respect to the Internet of Things allow control devices and/or machines to connect through cloud-based architectures in order to share information about their status and environment. Such an infrastructure allows seamless data sharing between fleet members, which could greatly improve the sample-efficiency of reinforcement learning techniques. However in practice, these machines, while almost identical in design, have small discrepancies due to production errors or degradation, preventing control algorithms to simply aggregate and employ all fleet data. We propose a novel reinforcement learning method that learns to transfer knowledge between similar fleet members and creates member-specific dynamics models for control. Our algorithm uses Gaussian processes to establish cross-member covariances. This is significantly different from standard transfer learning methods, as the focus is not on sharing information over tasks, but rather over system specifications. We demonstrate our approach on two benchmarks and a realistic wind farm setting. Our method significantly outperforms two baseline approaches, namely individual learning and joint learning where all samples are aggregated, in terms of the median and variance of the results.

LGNov 22, 2019
Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures

Timothy Verstraeten, Eugenio Bargiacchi, Pieter JK Libin et al.

Multi-agent coordination is prevalent in many real-world applications. However, such coordination is challenging due to its combinatorial nature. An important observation in this regard is that agents in the real world often only directly affect a limited set of neighbouring agents. Leveraging such loose couplings among agents is key to making coordination in multi-agent systems feasible. In this work, we focus on learning to coordinate. Specifically, we consider the multi-agent multi-armed bandit framework, in which fully cooperative loosely-coupled agents must learn to coordinate their decisions to optimize a common objective. We propose multi-agent Thompson sampling (MATS), a new Bayesian exploration-exploitation algorithm that leverages loose couplings. We provide a regret bound that is sublinear in time and low-order polynomial in the highest number of actions of a single agent for sparse coordination graphs. Additionally, we empirically show that MATS outperforms the state-of-the-art algorithm, MAUCE, on two synthetic benchmarks, and a novel benchmark with Poisson distributions. An example of a loosely-coupled multi-agent system is a wind farm. Coordination within the wind farm is necessary to maximize power production. As upstream wind turbines only affect nearby downstream turbines, we can use MATS to efficiently learn the optimal control mechanism for the farm. To demonstrate the benefits of our method toward applications we apply MATS to a realistic wind farm control task. In this task, wind turbines must coordinate their alignments with respect to the incoming wind vector in order to optimize power production. Our results show that MATS improves significantly upon state-of-the-art coordination methods in terms of performance, demonstrating the value of using MATS in practical applications with sparse neighbourhood structures.

CVSep 30, 2019
IPC-Net: 3D point-cloud segmentation using deep inter-point convolutional layers

Felipe Gomez Marulanda, Pieter Libin, Timothy Verstraeten et al.

Over the last decade, the demand for better segmentation and classification algorithms in 3D spaces has significantly grown due to the popularity of new 3D sensor technologies and advancements in the field of robotics. Point-clouds are one of the most popular representations to store a digital description of 3D shapes. However, point-clouds are stored in irregular and unordered structures, which limits the direct use of segmentation algorithms such as Convolutional Neural Networks. The objective of our work is twofold: First, we aim to provide a full analysis of the PointNet architecture to illustrate which features are being extracted from the point-clouds. Second, to propose a new network architecture called IPC-Net to improve the state-of-the-art point cloud architectures. We show that IPC-Net extracts a larger set of unique features allowing the model to produce more accurate segmentations compared to the PointNet architecture. In general, our approach outperforms PointNet on every family of 3D geometries on which the models were tested. A high generalisation improvement was observed on every 3D shape, especially on the rockets dataset. Our experiments demonstrate that our main contribution, inter-point activation on the network's layers, is essential to accurately segment 3D point-clouds.

SYApr 3, 2019
Fleetwide data-enabled reliability improvement of wind turbines

Timothy Verstraeten, Ann Nowe, Jonathan Keller et al.

Wind farms are an indispensable driver toward renewable and nonpolluting energy resources. However, as ideal sites are limited, placement in remote and challenging locations results in higher logistics costs and lower average wind speeds. Therefore, it is critical to increase the reliability of the turbines to reduce maintenance costs. Robust implementation requires a thorough understanding of the loads subject to the turbine's control. Yet, such dynamically changing multidimensional loads are uncommon with other machinery, and generally underresearched. Therefore, a multitiered approach is proposed to investigate the load spectrum occurring in wind farms. Our approach relies on both fundamental research using controllable test rigs, as well as analyses of real-world loading conditions in high-frequency supervisory control and data acquisition data. A method is introduced to detect operational zones in wind farm data and link them with load distributions. Additionally, while focused research further investigates the load spectrum, a method is proposed that continuously optimizes the farm's control protocols without the need to fully understand the loads that occur. A case of gearbox failure is investigated based on a vast body of past experiments and suspect loads are identified. Starting from this evidence on the cause and effects of dynamic loads, the potential of our methods is shown by analyzing real-world farm loading conditions on a steady-state case of wake and developing a preventive row-based control protocol for a case of cascading emergency brakes induced by a storm.

LGNov 16, 2017
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies

Pieter Libin, Timothy Verstraeten, Diederik M. Roijers et al.

Pandemic influenza has the epidemic potential to kill millions of people. While various preventive measures exist (i.a., vaccination and school closures), deciding on strategies that lead to their most effective and efficient use remains challenging. To this end, individual-based epidemiological models are essential to assist decision makers in determining the best strategy to curb epidemic spread. However, individual-based models are computationally intensive and it is therefore pivotal to identify the optimal strategy using a minimal amount of model evaluations. Additionally, as epidemiological modeling experiments need to be planned, a computational budget needs to be specified a priori. Consequently, we present a new sampling technique to optimize the evaluation of preventive strategies using fixed budget best-arm identification algorithms. We use epidemiological modeling theory to derive knowledge about the reward distribution which we exploit using Bayesian best-arm identification algorithms (i.e., Top-two Thompson sampling and BayesGap). We evaluate these algorithms in a realistic experimental setting and demonstrate that it is possible to identify the optimal strategy using only a limited number of model evaluations, i.e., 2-to-3 times faster compared to the uniform sampling method, the predominant technique used for epidemiological decision making in the literature. Finally, we contribute and evaluate a statistic for Top-two Thompson sampling to inform the decision makers about the confidence of an arm recommendation.