LG AI MLSep 27, 2018

Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning

Nasrin Sadeghianpourhamami, Johannes Deleu, Chris Develder

arXiv:1809.10679v28.3121 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of demand response for EV charging infrastructure, offering a scalable model-free solution, though it builds incrementally on existing RL techniques.

The paper tackles the problem of coordinating electric vehicle charging without requiring accurate models, proposing a reinforcement learning approach that jointly controls multiple charging stations at once. The method achieves performance within 10-15% of an oracle benchmark and generalizes to larger station sets.

Initial DR studies mainly adopt model predictive control and thus require accurate models of the control problem (e.g., a customer behavior model), which are to a large extent uncertain for the EV scenario. Hence, model-free approaches, especially based on reinforcement learning (RL) are an attractive alternative. In this paper, we propose a new Markov decision process (MDP) formulation in the RL framework, to jointly coordinate a set of EV charging stations. State-of-the-art algorithms either focus on a single EV, or perform the control of an aggregate of EVs in multiple steps (e.g., aggregate load decisions in one step, then a step translating the aggregate decision to individual connected EVs). On the contrary, we propose an RL approach to jointly control the whole set of EVs at once. We contribute a new MDP formulation, with a scalable state representation that is independent of the number of EV charging stations. Further, we use a batch reinforcement learning algorithm, i.e., an instance of fitted Q-iteration, to learn the optimal charging policy. We analyze its performance using simulation experiments based on a real-world EV charging data. More specifically, we (i) explore the various settings in training the RL policy (e.g., duration of the period with training data), (ii) compare its performance to an oracle all-knowing benchmark (which provides an upper bound for performance, relying on information that is not available or at least imperfect in practice), (iii) analyze performance over time, over the course of a full year to evaluate possible performance fluctuations (e.g, across different seasons), and (iv) demonstrate the generalization capacity of a learned control policy to larger sets of charging stations.

View on arXiv PDF

Similar