AIMar 3, 2022

Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning

Manu Lahariya, Nasrin Sadeghianpourhamami, Chris Develder

arXiv:2203.01654v14.510 citationsh-index: 41

Originality Incremental advance

AI Analysis

This incremental improvement addresses the practicality of model-free demand response for power grid management, specifically for EV charging station operators.

The paper tackled the computational inefficiency of a reinforcement learning (DR) algorithm for coordinating multiple EV charging stations by proposing an improved cost function, reducing training time by up to 50% while maintaining competitive performance in load flattening.

Electric vehicle (EV) charging stations represent a substantial load with significant flexibility. The exploitation of that flexibility in demand response (DR) algorithms becomes increasingly important to manage and balance demand and supply in power grids. Model-free DR based on reinforcement learning (RL) is an attractive approach to balance such EV charging load. We build on previous research on RL, based on a Markov decision process (MDP) to simultaneously coordinate multiple charging stations. However, we note that the computationally expensive cost function adopted in the previous research leads to large training times, which limits the feasibility and practicality of the approach. We, therefore, propose an improved cost function that essentially forces the learned control policy to always fulfill any charging demand that does not offer any flexibility. We rigorously compare the newly proposed batch RL fitted Q-iteration implementation with the original (costly) one, using real-world data. Specifically, for the case of load flattening, we compare the two approaches in terms of (i) the processing time to learn the RL-based charging policy, as well as (ii) the overall performance of the policy decisions in terms of meeting the target load for unseen test data. The performance is analyzed for different training periods and varying training sample sizes. In addition to both RL policies performance results, we provide performance bounds in terms of both (i) an optimal all-knowing strategy, and (ii) a simple heuristic spreading individual EV charging uniformly over time

View on arXiv PDF

Similar