Sustainability of Data Center Digital Twins with Reinforcement Learning
This work addresses the problem of sustainable data center management for the ML community, offering a tool to develop RL controllers, though it is incremental as it builds on existing RL and CFD methods.
The paper tackles the challenge of reducing energy consumption and carbon emissions in data centers by introducing DCRL-Green, a multi-agent reinforcement learning environment that enables holistic design and optimization of components like IT servers and HVAC cooling, resulting in a flexible and scalable platform for community-driven research.
The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. However, the complexity of designing and controlling them in tandem presents a significant challenge. While some individual components like CFD-based design and Reinforcement Learning (RL) based HVAC control have been researched, there's a gap in the holistic design and optimization covering all elements simultaneously. To tackle this, we've developed DCRL-Green, a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs. It is a flexible, modular, scalable, and configurable platform that can handle large High Performance Computing (HPC) clusters. Furthermore, in its default setup, DCRL-Green provides a benchmark for evaluating single as well as multi-agent RL algorithms. It easily allows users to subclass the default implementations and design their own control approaches, encouraging community development for sustainable data centers. Open Source Link: https://github.com/HewlettPackard/dc-rl