Deep Reinforcement Learning with Spatio-temporal Traffic Forecasting for Data-Driven Base Station Sleep Control
This work addresses energy efficiency in cellular networks, which is a critical issue for network operators, but it appears incremental as it builds on existing methods like DDPG with enhancements for specific bottlenecks.
The paper tackles the problem of high energy consumption in densely deployed 5G base stations by developing DeepBSC, a traffic-aware dynamic sleep control framework that uses spatio-temporal traffic forecasting and reinforcement learning to deactivate idle base stations, resulting in significant energy savings while maintaining quality of service.
To meet the ever increasing mobile traffic demand in 5G era, base stations (BSs) have been densely deployed in radio access networks (RANs) to increase the network coverage and capacity. However, as the high density of BSs is designed to accommodate peak traffic, it would consume an unnecessarily large amount of energy if BSs are on during off-peak time. To save the energy consumption of cellular networks, an effective way is to deactivate some idle base stations that do not serve any traffic demand. In this paper, we develop a traffic-aware dynamic BS sleep control framework, named DeepBSC, which presents a novel data-driven learning approach to determine the BS active/sleep modes while meeting lower energy consumption and satisfactory Quality of Service (QoS) requirements. Specifically, the traffic demands are predicted by the proposed GS-STN model, which leverages the geographical and semantic spatial-temporal correlations of mobile traffic. With accurate mobile traffic forecasting, the BS sleep control problem is cast as a Markov Decision Process that is solved by Actor-Critic reinforcement learning methods. To reduce the variance of cost estimation in the dynamic environment, we propose a benchmark transformation method that provides robust performance indicator for policy update. To expedite the training process, we adopt a Deep Deterministic Policy Gradient (DDPG) approach, together with an explorer network, which can strengthen the exploration further. Extensive experiments with a real-world dataset corroborate that our proposed framework significantly outperforms the existing methods.