SY LG MAMay 30, 2023

Centralised rehearsal of decentralised cooperation: Multi-agent reinforcement learning for the scalable coordination of residential energy flexibility

Flora Charbonnier, Bei Peng, Thomas Morstyn, Malcolm McCulloch

arXiv:2305.18875v25.919 citations

Originality Incremental advance

AI Analysis

This addresses the scalable and privacy-preserving coordination of distributed energy resources to mitigate climate change, though it is incremental as it builds on existing multi-agent reinforcement learning methods.

The paper tackles the problem of coordinating residential energy resources like electric vehicles and heating to integrate renewable energy, achieving significant savings for users, the network, and emissions with training times nearly 40 times shorter than a previous state-of-the-art method for 30 homes.

This paper investigates how deep multi-agent reinforcement learning can enable the scalable and privacy-preserving coordination of residential energy flexibility. The coordination of distributed resources such as electric vehicles and heating will be critical to the successful integration of large shares of renewable energy in our electricity grid and, thus, to help mitigate climate change. The pre-learning of individual reinforcement learning policies can enable distributed control with no sharing of personal data required during execution. However, previous approaches for multi-agent reinforcement learning-based distributed energy resources coordination impose an ever greater training computational burden as the size of the system increases. We therefore adopt a deep multi-agent actor-critic method which uses a \emph{centralised but factored critic} to rehearse coordination ahead of execution. Results show that coordination is achieved at scale, with minimal information and communication infrastructure requirements, no interference with daily activities, and privacy protection. Significant savings are obtained for energy users, the distribution network and greenhouse gas emissions. Moreover, training times are nearly 40 times shorter than with a previous state-of-the-art reinforcement learning approach without the factored critic for 30 homes.

View on arXiv PDF

Similar