Bert Claessens

h-index14

20papers

703citations

Novelty44%

AI Score47

Ranked #56,223 of 201,326 authors (top 28%)#251 in SY (top 25%)

20 Papers

41.2AIJun 1

S3TS: Stochastic Scenario-Structured Tree Search for Advanced Planning Under Uncertainty

Fabio Pavirani, Bert Claessens, Pierre Pinson et al.

Effective scheduling in the energy sector is essential to ensure the reliable operation of electrical grids and their connected assets by, for instance, optimizing the dispatch of generation units and storage systems. An effective planning strategy must (a) accommodate advanced and potentially non-linear system models -- exploiting the increasing data availability of modern grids, and (b) explicitly handle uncertainties arising, for instance, from the integration of renewable energy sources. While existing approaches can address either non-linearity (e.g., Monte Carlo Tree Search) or uncertainty (e.g., stochastic mathematical optimization), there is a lack of planning techniques capable of addressing both challenges simultaneously. To bridge this gap, we propose a Stochastic Scenario-Structured Tree Search (S3TS) algorithm that explicitly represents uncertainty through scenario trees while enabling the integration of advanced non-linear models. We evaluate S3TS on a simulated demand response signal publication problem, largely mimicking the imbalance settlement mechanism in Belgium. The results demonstrate near-optimal performance in linear, analytically tractable settings, with costs within 14% of the mathematically optimal solution conditioned to the scenario trees. In highly non-linear scenarios, S3TS significantly outperforms baseline methods, achieving cost reductions of up to 51% and 5.4% compared to a myopic algorithm and deterministic MCTS, respectively.

OCOct 23, 2017

Combined Stochastic Optimization of Frequency Control and Self-Consumption with a Battery

Jonas Engels, Bert Claessens, Geert Deconinck

Optimally combining frequency control with self-consumption can increase revenues from battery storage systems installed behind-the-meter. This work presents an optimized control strategy that allows a battery to be used simultaneously for self-consumption and primary frequency control. Therein, it addresses two stochastic problems: the delivery of primary frequency control with a battery and the use of the battery for self-consumption. We propose a linear recharging policy to regulate the state of charge of the battery while providing primary frequency control. Formulating this as a chance-constrained problem, we can ensure that the risk of battery constraint violations stays below a predefined probability. We use robust optimization as a safe approximation to the chance-constraints, which allows to make the risk of constraint violation arbitrarily low, while keeping the problem tractable and offering maximum reserve capacity. Simulations with real frequency measurements prove the effectiveness of the designed recharging strategy. We adopt a rule-based policy for self-consumption, which is optimized using stochastic programming. This policy allows to reserve more energy and power of the battery on moments when expected consumption or production is higher, while using other moments for recharging from primary frequency control. We show that optimally combining the two services increases value from batteries significantly.

SYMar 11, 2019

Techno-Economic Analysis and Optimal Control of Battery Storage for Frequency Control Services, Applied to the German Market

Jonas Engels, Bert Claessens, Geert Deconinck

Optimal investment in battery energy storage systems, taking into account degradation, sizing and control, is crucial for the deployment of battery storage, of which providing frequency control is one of the major applications. In this paper, we present a holistic, data-driven framework to determine the optimal investment, size and controller of a battery storage system providing frequency control. We optimised the controller towards minimum degradation and electricity costs over its lifetime, while ensuring the delivery of frequency control services compliant with regulatory requirements. We adopted a detailed battery model, considering the dynamics and degradation when exposed to actual frequency data. Further, we used a stochastic optimisation objective while constraining the probability on unavailability to deliver the frequency control service. Through a thorough analysis, we were able to decrease the amount of data needed and thereby decrease the execution time while keeping the approximation error within limits. Using the proposed framework, we performed a techno-economic analysis of a battery providing 1 MW capacity in the German primary frequency control market. Results showed that a battery rated at 1.6 MW, 1.6 MWh has the highest net present value, yet this configuration is only profitable if costs are low enough or in case future frequency control prices do not decline too much. It transpires that calendar ageing drives battery degradation, whereas cycle ageing has less impact.

SYMar 11, 2019

Grid-Constrained Distributed Optimization for Frequency Control with Low-Voltage Flexibility

Jonas Engels, Bert Claessens, Geert Deconinck

Providing frequency control services with flexible assets connected to the low-voltage distribution grid, amongst which residential battery storage or electrical hot water boilers, can lead to congestion problems and voltage issues in the distribution grid. In order to mitigate these problems, a new regulation has been put in place in Belgium, imposing a specific constraint: in any circle with a radius of 100m, there can be at maximum 10 connection points providing frequency control at any time. This paper presents an impact analysis and a coordination strategy of a Flexibility Service Provider (FSP) that operates a pool of assets and is exposed to this new regulatory constraint. Results show that at 5% participation, only 90% of total control capacity can be used, with a large difference between neighbourhoods with different population densities. A distributed optimization framework to coordinate the assets arises naturally, in which the assets are able to keep their local cost functions private and only have to communicate with neighbouring assets that are geographically close, and with the FSP. Analysis of the proposed distributed optimization algorithm shows a clear trade-off between optimality gap, owing to the mixed-integer nature of the problem, and iterations to convergence.

SYFeb 6, 2017

The use of distributed thermal storage in district heating grids for demand side management

Dirk Vanhoudt, Bert Claessens, Robbe Salenbien et al.

The work presented in this paper relates to a small scale district heating network heated by a gas fired CHP. In most common situations, such a CHP is heat driven operated, meaning that the CHP will switch on whenever heat is needed, while not taking into account the demand of electricity at that time. In this paper however, an active control strategy is developed, aiming to maximize the profit of the CHP, selling its electricity to the spot market. The CHP will therefore switch on at moments of high electricity prices. Nevertheless, since there never is a perfect match between the demand of heat and the demand of electricity, a thermal energy storage is included in the network to overcome the difference between supply and demand of heat in the network. In this study, three different storage concepts are compared: (1) a central buffer tank next to the CHP; (2) small storage vessels distributed over the different connected buildings; and (3) the use of the thermal mass if the buildings as storage capacity. Besides the development of the control algorithms based on model predictive control, a simulation model of the network is described to evaluate the performance of the different storage concept during a representative winter week. The results show that the presented control algorithm can significantly influence the heat demand profile of the connected buildings. As a results, active control of the CHP can drastically increase the profit of the CHP. The concept with the distributed buffers gives the best results, however the profit for the thermal mass concept is only marginally smaller. Since in this latter case no significant investment costs are needed, the conclusion for this case study is that the use of thermal mass of buildings for demand side management in district heating systems is very promising.

LGOct 29, 2023

Transfer Learning in Transformer-Based Demand Forecasting For Home Energy Management System

Gargya Gokhale, Jonas Van Gompel, Bert Claessens et al.

Increasingly, homeowners opt for photovoltaic (PV) systems and/or battery storage to minimize their energy bills and maximize renewable energy usage. This has spurred the development of advanced control algorithms that maximally achieve those goals. However, a common challenge faced while developing such controllers is the unavailability of accurate forecasts of household power consumption, especially for shorter time resolutions (15 minutes) and in a data-efficient manner. In this paper, we analyze how transfer learning can help by exploiting data from multiple households to improve a single house's load forecasting. Specifically, we train an advanced forecasting model (a temporal fusion transformer) using data from multiple different households, and then finetune this global model on a new household with limited data (i.e. only a few days). The obtained models are used for forecasting power consumption of the household for the next 24 hours~(day-ahead) at a time resolution of 15 minutes, with the intention of using these forecasts in advanced controllers such as Model Predictive Control. We show the benefit of this transfer learning setup versus solely using the individual new household's data, both in terms of (i) forecasting accuracy ($\sim$15\% MAE reduction) and (ii) control performance ($\sim$2\% energy cost reduction), using real-world household data.

SYOct 29, 2023

Real-World Implementation of Reinforcement Learning Based Energy Coordination for a Cluster of Households

Gargya Gokhale, Niels Tiben, Marie-Sophie Verwee et al.

Given its substantial contribution of 40\% to global power consumption, the built environment has received increasing attention to serve as a source of flexibility to assist the modern power grid. In that respect, previous research mainly focused on energy management of individual buildings. In contrast, in this paper, we focus on aggregated control of a set of residential buildings, to provide grid supporting services, that eventually should include ancillary services. In particular, we present a real-life pilot study that studies the effectiveness of reinforcement-learning (RL) in coordinating the power consumption of 8 residential buildings to jointly track a target power signal. Our RL approach relies solely on observed data from individual households and does not require any explicit building models or simulators, making it practical to implement and easy to scale. We show the feasibility of our proposed RL-based coordination strategy in a real-world setting. In a 4-week case study, we demonstrate a hierarchical control system, relying on an RL-based ranking system to select which households to activate flex assets from, and a real-time PI control-based power dispatch mechanism to control the selected assets. Our results demonstrate satisfactory power tracking, and the effectiveness of the RL-based ranks which are learnt in a purely data-driven manner.

82.0SYApr 2

Neural Network-Assisted Model Predictive Control for Implicit Balancing

Seyed Soroush Karimi Madahi, Kenneth Bruninx, Bert Claessens et al.

In Europe, balance responsible parties can deliberately take out-of-balance positions to support transmission system operators (TSOs) in maintaining grid stability and earn profit, a practice called implicit balancing. Model predictive control (MPC) is widely adopted as an effective approach for implicit balancing. The balancing market model accuracy in MPC is critical to decision quality. Previous studies modeled this market using either (i) a convex market clearing approximation, ignoring proactive manual actions by TSOs and the market sub-quarter-hour dynamics, or (ii) machine learning methods, which cannot be directly integrated into MPC. To address these shortcomings, we propose a data-driven balancing market model integrated into MPC using an input convex neural network to ensure convexity while capturing uncertainties. To keep the core network computationally efficient, we incorporate attention-based input gating mechanisms to remove irrelevant data. Evaluating on Belgian data shows that the proposed model both improves MPC decisions and reduces computational time.

SYDec 6, 2023

Demand response for residential building heating: Effective Monte Carlo Tree Search control based on physics-informed neural networks

Fabio Pavirani, Gargya Gokhale, Bert Claessens et al.

To reduce global carbon emissions and limit climate change, controlling energy consumption in buildings is an important piece of the puzzle. Here, we specifically focus on using a demand response (DR) algorithm to limit the energy consumption of a residential building's heating system while respecting user's thermal comfort. In that domain, Reinforcement learning (RL) methods have been shown to be quite effective. One such RL method is Monte Carlo Tree Search (MCTS), which has achieved impressive success in playing board games (go, chess). A particular advantage of MCTS is that its decision tree structure naturally allows to integrate exogenous constraints (e.g., by trimming branches that violate them), while conventional RL solutions need more elaborate techniques (e.g., indirectly by adding penalties in the cost/reward function, or through a backup controller that corrects constraint-violating actions). The main aim of this paper is to study the adoption of MCTS for building control, since this (to the best of our knowledge) has remained largely unexplored. A specific property of MCTS is that it needs a simulator component that can predict subsequent system states, based on actions taken. A straightforward data-driven solution is to use black-box neural networks (NNs). We will however extend a Physics-informed Neural Network (PiNN) model to deliver multi-timestep predictions, and show the benefit it offers in terms of lower prediction errors ($-$32\% MAE) as well as better MCTS performance ($-$4\% energy cost, $+$7\% thermal comfort) compared to a black-box NN. A second contribution will be to extend a vanilla MCTS version to adopt the ideas applied in AlphaZero (i.e., using learned prior and value functions and an action selection heuristic) to obtain lower computational costs while maintaining control performance.

SYNov 6, 2024

Predicting and Publishing Accurate Imbalance Prices Using Monte Carlo Tree Search

Fabio Pavirani, Jonas Van Gompel, Seyed Soroush Karimi Madahi et al.

The growing reliance on renewable energy sources, particularly solar and wind, has introduced challenges due to their uncontrollable production. This complicates maintaining the electrical grid balance, prompting some transmission system operators in Western Europe to implement imbalance tariffs that penalize unsustainable power deviations. These tariffs create an implicit demand response framework to mitigate grid instability. Yet, several challenges limit active participation. In Belgium, for example, imbalance prices are only calculated at the end of each 15-minute settlement period, creating high risk due to price uncertainty. This risk is further amplified by the inherent volatility of imbalance prices, discouraging participation. Although transmission system operators provide minute-based price predictions, the system imbalance volatility makes accurate price predictions challenging to obtain and requires sophisticated techniques. Moreover, publishing price estimates can prompt participants to adjust their schedules, potentially affecting the system balance and the final price, adding further complexity. To address these challenges, we propose a Monte Carlo Tree Search method that publishes accurate imbalance prices while accounting for potential response actions. Our approach models the system dynamics using a neural network forecaster and a cluster of virtual batteries controlled by reinforcement learning agents. Compared to Belgium's current publication method, our technique improves price accuracy by 20.4% under ideal conditions and by 12.8% in more realistic scenarios. This research addresses an unexplored, yet crucial problem, positioning this paper as a pioneering work in analyzing the potential of more advanced imbalance price publishing techniques.

SYApr 29, 2024

Control Policy Correction Framework for Reinforcement Learning-based Energy Arbitrage Strategies

Seyed Soroush Karimi Madahi, Gargya Gokhale, Marie-Sophie Verwee et al.

A continuous rise in the penetration of renewable energy sources, along with the use of the single imbalance pricing, provides a new opportunity for balance responsible parties to reduce their cost through energy arbitrage in the imbalance settlement mechanism. Model-free reinforcement learning (RL) methods are an appropriate choice for solving the energy arbitrage problem due to their outstanding performance in solving complex stochastic sequential problems. However, RL is rarely deployed in real-world applications since its learned policy does not necessarily guarantee safety during the execution phase. In this paper, we propose a new RL-based control framework for batteries to obtain a safe energy arbitrage strategy in the imbalance settlement mechanism. In our proposed control framework, the agent initially aims to optimize the arbitrage revenue. Subsequently, in the post-processing step, we correct (constrain) the learned policy following a knowledge distillation process based on properties that follow human intuition. Our post-processing step is a generic method and is not restricted to the energy arbitrage domain. We use the Belgian imbalance price of 2023 to evaluate the performance of our proposed framework. Furthermore, we deploy our proposed control framework on a real battery to show its capability in the real world.

SYApr 23, 2024

Probabilistic forecasting of power system imbalance using neural network-based ensembles

Jonas Van Gompel, Bert Claessens, Chris Develder

Keeping the balance between electricity generation and consumption is becoming increasingly challenging and costly, mainly due to the rising share of renewables, electric vehicles and heat pumps and electrification of industrial processes. Accurate imbalance forecasts, along with reliable uncertainty estimations, enable transmission system operators (TSOs) to dispatch appropriate reserve volumes, reducing balancing costs. Further, market parties can use these probabilistic forecasts to design strategies that exploit asset flexibility to help balance the grid, generating revenue with known risks. Despite its importance, literature regarding system imbalance (SI) forecasting is limited. Further, existing methods do not focus on situations with high imbalance magnitude, which are crucial to forecast accurately for both TSOs and market parties. Hence, we propose an ensemble of C-VSNs, which are our adaptation of variable selection networks (VSNs). Each minute, our model predicts the imbalance of the current and upcoming two quarter-hours, along with uncertainty estimations on these forecasts. We evaluate our approach by forecasting the imbalance of Belgium, where high imbalance magnitude is defined as $|$SI$| > 500\,$MW (occurs 1.3% of the time in Belgium). For high imbalance magnitude situations, our model outperforms the state-of-the-art by 23.4% (in terms of continuous ranked probability score (CRPS), which evaluates probabilistic forecasts), while also attaining a 6.5% improvement in overall CRPS. Similar improvements are achieved in terms of root-mean-squared error. Additionally, we developed a fine-tuning methodology to effectively include new inputs with limited history in our model. This work was performed in collaboration with Elia (the Belgian TSO) to further improve their imbalance forecasts, demonstrating the relevance of our work.

SYMar 18, 2024

Distill2Explain: Differentiable decision trees for explainable reinforcement learning in energy application controllers

Gargya Gokhale, Seyed Soroush Karimi Madahi, Bert Claessens et al.

Demand-side flexibility is gaining importance as a crucial element in the energy transition process. Accounting for about 25% of final energy consumption globally, the residential sector is an important (potential) source of energy flexibility. However, unlocking this flexibility requires developing a control framework that (1) easily scales across different houses, (2) is easy to maintain, and (3) is simple to understand for end-users. A potential control framework for such a task is data-driven control, specifically model-free reinforcement learning (RL). Such RL-based controllers learn a good control policy by interacting with their environment, learning purely based on data and with minimal human intervention. Yet, they lack explainability, which hampers user acceptance. Moreover, limited hardware capabilities of residential assets forms a hurdle (e.g., using deep neural networks). To overcome both those challenges, we propose a novel method to obtain explainable RL policies by using differentiable decision trees. Using a policy distillation approach, we train these differentiable decision trees to mimic standard RL-based controllers, leading to a decision tree-based control policy that is data-driven and easy to explain. As a proof-of-concept, we examine the performance and explainability of our proposed approach in a battery-based home energy management system to reduce energy costs. For this use case, we show that our proposed approach can outperform baseline rule-based policies by about 20-25%, while providing simple, explainable control policies. We further compare these explainable policies with standard RL policies and examine the performance trade-offs associated with this increased explainability.

SYOct 6, 2025

Model Predictive Control-Guided Reinforcement Learning for Implicit Balancing

Seyed Soroush Karimi Madahi, Kenneth Bruninx, Bert Claessens et al.

In Europe, profit-seeking balance responsible parties can deviate in real time from their day-ahead nominations to assist transmission system operators in maintaining the supply-demand balance. Model predictive control (MPC) strategies to exploit these implicit balancing strategies capture arbitrage opportunities, but fail to accurately capture the price-formation process in the European imbalance markets and face high computational costs. Model-free reinforcement learning (RL) methods are fast to execute, but require data-intensive training and usually rely on real-time and historical data for decision-making. This paper proposes an MPC-guided RL method that combines the complementary strengths of both MPC and RL. The proposed method can effectively incorporate forecasts into the decision-making process (as in MPC), while maintaining the fast inference capability of RL. The performance of the proposed method is evaluated on the implicit balancing battery control problem using Belgian balancing data from 2023. First, we analyze the performance of the standalone state-of-the-art RL and MPC methods from various angles, to highlight their individual strengths and limitations. Next, we show an arbitrage profit benefit of the proposed MPC-guided RL method of 16.15% and 54.36%, compared to standalone RL and MPC.

SYMar 18, 2024

Explainable Reinforcement Learning-based Home Energy Management Systems using Differentiable Decision Trees

Gargya Gokhale, Bert Claessens, Chris Develder

With the ongoing energy transition, demand-side flexibility has become an important aspect of the modern power grid for providing grid support and allowing further integration of sustainable energy sources. Besides traditional sources, the residential sector is another major and largely untapped source of flexibility, driven by the increased adoption of solar PV, home batteries, and EVs. However, unlocking this residential flexibility is challenging as it requires a control framework that can effectively manage household energy consumption, and maintain user comfort while being readily scalable across different, diverse houses. We aim to address this challenging problem and introduce a reinforcement learning-based approach using differentiable decision trees. This approach integrates the scalability of data-driven reinforcement learning with the explainability of (differentiable) decision trees. This leads to a controller that can be easily adapted across different houses and provides a simple control policy that can be explained to end-users, further improving user acceptance. As a proof-of-concept, we analyze our method using a home energy management problem, comparing its performance with commercially available rule-based baseline and standard neural network-based RL controllers. Through this preliminary study, we show that the performance of our proposed method is comparable to standard RL-based controllers, outperforming baseline controllers by ~20% in terms of daily cost savings while being straightforward to explain.

LGDec 23, 2023

Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism

Seyed Soroush Karimi Madahi, Bert Claessens, Chris Develder

Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforcement learning (DRL). Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences: we aim to optimize a weighted sum of the arbitrage profit and a risk measure while constraining the daily number of cycles for the battery. We assess the performance of our proposed control framework using the Belgian imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q learning and soft actor-critic. Results reveal that the distributional soft actor-critic method can outperform other methods. Moreover, we note that our fully risk-averse agent appropriately learns to hedge against the risk related to the unknown imbalance price by (dis)charging the battery only when the agent is more certain about the price.

SPNov 23, 2021

Physics Informed Neural Networks for Control Oriented Thermal Modeling of Buildings

Gargya Gokhale, Bert Claessens, Chris Develder

This paper presents a data-driven modeling approach for developing control-oriented thermal models of buildings. These models are developed with the objective of reducing energy consumption costs while controlling the indoor temperature of the building within required comfort limits. To combine the interpretability of white/gray box physics models and the expressive power of neural networks, we propose a physics informed neural network approach for this modeling task. Along with measured data and building parameters, we encode the neural networks with the underlying physics that governs the thermal behavior of these buildings. Thus, realizing a model that is guided by physics, aids in modeling the temporal evolution of room temperature and power consumption as well as the hidden state, i.e., the temperature of building thermal mass for subsequent time steps. The main research contributions of this work are: (1) we propose two variants of physics informed neural network architectures for the task of control-oriented thermal modeling of buildings, (2) we show that training these architectures is data-efficient, requiring less training data compared to conventional, non-physics informed neural networks, and (3) we show that these architectures achieve more accurate predictions than conventional neural networks for longer prediction horizons. We test the prediction performance of the proposed architectures using simulated and real-word data to demonstrate (2) and (3) and show that the proposed physics informed neural network architectures can be used for this control-oriented modeling problem.

LGNov 29, 2015

Reinforcement Learning Applied to an Electric Water Heater: From Theory to Practice

Frederik Ruelens, Bert Claessens, Salman Quaiyum et al.

Electric water heaters have the ability to store energy in their water buffer without impacting the comfort of the end user. This feature makes them a prime candidate for residential demand response. However, the stochastic and nonlinear dynamics of electric water heaters, makes it challenging to harness their flexibility. Driven by this challenge, this paper formulates the underlying sequential decision-making problem as a Markov decision process and uses techniques from reinforcement learning. Specifically, we apply an auto-encoder network to find a compact feature representation of the sensor measurements, which helps to mitigate the curse of dimensionality. A wellknown batch reinforcement learning technique, fitted Q-iteration, is used to find a control policy, given this feature representation. In a simulation-based experiment using an electric water heater with 50 temperature sensors, the proposed method was able to achieve good policies much faster than when using the full state information. In a lab experiment, we apply fitted Q-iteration to an electric water heater with eight temperature sensors. Further reducing the state vector did not improve the results of fitted Q-iteration. The results of the lab experiment, spanning 40 days, indicate that compared to a thermostat controller, the presented approach was able to reduce the total cost of energy consumption of the electric water heater by 15%.

AIJul 13, 2015

Experimental analysis of data-driven control for a building heating system

Giuseppe Tommaso Costanzo, Sandro Iacovella, Frederik Ruelens et al.

Driven by the opportunity to harvest the flexibility related to building climate control for demand response applications, this work presents a data-driven control approach building upon recent advancements in reinforcement learning. More specifically, model assisted batch reinforcement learning is applied to the setting of building climate control subjected to a dynamic pricing. The underlying sequential decision making problem is cast on a markov decision problem, after which the control algorithm is detailed. In this work, fitted Q-iteration is used to construct a policy from a batch of experimental tuples. In those regions of the state space where the experimental sample density is low, virtual support samples are added using an artificial neural network. Finally, the resulting policy is shaped using domain knowledge. The control approach has been evaluated quantitatively using a simulation and qualitatively in a living lab. From the quantitative analysis it has been found that the control approach converges in approximately 20 days to obtain a control policy with a performance within 90% of the mathematical optimum. The experimental analysis confirms that within 10 to 20 days sensible policies are obtained that can be used for different outside temperature regimes.

SYApr 8, 2015

Residential Demand Response Applications Using Batch Reinforcement Learning

Frederik Ruelens, Bert Claessens, Stijn Vandael et al.

Driven by recent advances in batch Reinforcement Learning (RL), this paper contributes to the application of batch RL to demand response. In contrast to conventional model-based approaches, batch RL techniques do not require a system identification step, which makes them more suitable for a large-scale implementation. This paper extends fitted Q-iteration, a standard batch RL technique, to the situation where a forecast of the exogenous data is provided. In general, batch RL techniques do not rely on expert knowledge on the system dynamics or the solution. However, if some expert knowledge is provided, it can be incorporated by using our novel policy adjustment method. Finally, we tackle the challenge of finding an open-loop schedule required to participate in the day-ahead market. We propose a model-free Monte-Carlo estimator method that uses a metric to construct artificial trajectories and we illustrate this method by finding the day-ahead schedule of a heat-pump thermostat. Our experiments show that batch RL techniques provide a valuable alternative to model-based controllers and that they can be used to construct both closed-loop and open-loop policies.