AI LGOct 10, 2020

Reinforcement Learning on Computational Resource Allocation of Cloud-based Wireless Networks

Beiran Chen, Yi Zhang, George Iosifidis, Mingming Liu

arXiv:2010.05024v12.37 citations

Originality Incremental advance

AI Analysis

This addresses energy efficiency in cloud-based IoT networks, but it is incremental as it applies existing RL methods to a specific domain problem.

The paper tackles dynamic computational resource allocation in cloud-based wireless networks for IoT by modeling it as a Markov Decision Process and using a model-based reinforcement learning agent with value iteration. The results show the agent converges rapidly to optimal policies, performs stably across settings, and outperforms or matches a baseline in energy savings for scenarios like SDR and SDN.

Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing. Softwarised and centralised signal processing and network switching in the cloud enables flexible network control and management. In a cloud environment, dynamic computational resource allocation is essential to save energy while maintaining the performance of the processes. The stochastic features of the Central Processing Unit (CPU) load variation as well as the possible complex parallelisation situations of the cloud processes makes the dynamic resource allocation an interesting research challenge. This paper models this dynamic computational resource allocation problem into a Markov Decision Process (MDP) and designs a model-based reinforcement-learning agent to optimise the dynamic resource allocation of the CPU usage. Value iteration method is used for the reinforcement-learning agent to pick up the optimal policy during the MDP. To evaluate our performance we analyse two types of processes that can be used in the cloud-based IoT networks with different levels of parallelisation capabilities, i.e., Software-Defined Radio (SDR) and Software-Defined Networking (SDN). The results show that our agent rapidly converges to the optimal policy, stably performs in different parameter settings, outperforms or at least equally performs compared to a baseline algorithm in energy savings for different scenarios.

View on arXiv PDF

Similar