LG AI IT MLApr 1, 2019

Distributed Power Control for Large Energy Harvesting Networks: A Multi-Agent Deep Reinforcement Learning Approach

Mohit K. Sharma, Alessio Zappone, Mohamad Assaad, Merouane Debbah, Spyridon Vassilaras

arXiv:1904.00601v27.151 citations

Originality Incremental advance

AI Analysis

This addresses power management in wireless networks with energy harvesting, offering a scalable solution for large-scale systems, though it is incremental as it builds on existing MARL and mean-field game techniques.

The paper tackles the problem of online power control in large energy harvesting networks by developing a multi-agent deep reinforcement learning framework that learns distributed policies, achieving performance close to centralized methods for which conventional approaches are intractable.

In this paper, we develop a multi-agent reinforcement learning (MARL) framework to obtain online power control policies for a large energy harvesting (EH) multiple access channel, when only causal information about the EH process and wireless channel is available. In the proposed framework, we model the online power control problem as a discrete-time mean-field game (MFG), and analytically show that the MFG has a unique stationary solution. Next, we leverage the fictitious play property of the mean-field games, and the deep reinforcement learning technique to learn the stationary solution of the game, in a completely distributed fashion. We analytically show that the proposed procedure converges to the unique stationary solution of the MFG. This, in turn, ensures that the optimal policies can be learned in a completely distributed fashion. In order to benchmark the performance of the distributed policies, we also develop a deep neural network (DNN) based centralized as well as distributed online power control schemes. Our simulation results show the efficacy of the proposed power control policies. In particular, the DNN based centralized power control policies provide a very good performance for large EH networks for which the design of optimal policies is intractable using the conventional methods such as Markov decision processes. Further, performance of both the distributed policies is close to the throughput achieved by the centralized policies.

View on arXiv PDF

Similar