NIAIDec 14, 2020

A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability

arXiv:2012.07279v211 citations
AI Analysis

This work offers an alternative to the conventional drift-plus-penalty (DPP) algorithm for Lyapunov optimization, which is beneficial for researchers and practitioners dealing with queue stability and resource allocation in edge computing.

This paper proposes a deep reinforcement learning (DRL) approach to Lyapunov optimization to minimize time-average penalties while maintaining queue stability. It successfully applies this method to resource allocation in edge computing systems, demonstrating its operational effectiveness.

In this paper, a deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. A proper construction of state and action spaces is provided to form a proper Markov decision process (MDP) for the Lyapunov optimization. A condition for the reward function of reinforcement learning (RL) for queue stability is derived. Based on the analysis and practical RL with reward discounting, a class of reward functions is proposed for the DRL-based approach to the Lyapunov optimization. The proposed DRL-based approach to the Lyapunov optimization does not required complicated optimization at each time step and operates with general non-convex and discontinuous penalty functions. Hence, it provides an alternative to the conventional drift-plus-penalty (DPP) algorithm for the Lyapunov optimization. The proposed DRL-based approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes