Decentralized Computation Offloading for Multi-User Mobile Edge Computing: A Deep Reinforcement Learning Approach
This work addresses computation offloading for resource-limited mobile devices in edge computing, but it is incremental as it applies an existing DRL method (DDPG) to a known bottleneck in MEC systems.
The paper tackles the problem of minimizing long-term average computation cost (power consumption and buffering delay) in multi-user mobile edge computing systems with stochastic task arrivals and wireless channels, using a deep reinforcement learning approach, and demonstrates that the proposed DDPG-based decentralized strategy outperforms conventional DQN-based and greedy strategies with reduced computation cost.
Mobile edge computing (MEC) emerges recently as a promising solution to relieve resource-limited mobile devices from computation-intensive tasks, which enables devices to offload workloads to nearby MEC servers and improve the quality of computation experience. Nevertheless, by considering a MEC system consisting of multiple mobile users with stochastic task arrivals and wireless channels in this paper, the design of computation offloading policies is challenging to minimize the long-term average computation cost in terms of power consumption and buffering delay. A deep reinforcement learning (DRL) based decentralized dynamic computation offloading strategy is investigated to build a scalable MEC system with limited feedback. Specifically, a continuous action space-based DRL approach named deep deterministic policy gradient (DDPG) is adopted to learn efficient computation offloading policies independently at each mobile user. Thus, powers of both local execution and task offloading can be adaptively allocated by the learned policies from each user's local observation of the MEC system. Numerical results are illustrated to demonstrate that efficient policies can be learned at each user, and performance of the proposed DDPG based decentralized strategy outperforms the conventional deep Q-network (DQN) based discrete power control strategy and some other greedy strategies with reduced computation cost. Besides, the power-delay tradeoff is also analyzed for both the DDPG based and DQN based strategies.