LGMLJun 10, 2019

Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing

arXiv:1906.04594v175 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in resource allocation for network slicing, offering an incremental improvement to deep reinforcement learning methods in this domain.

The paper tackles the slow convergence of deep Q-learning in network slicing resource management with large discrete action spaces by introducing discrete normalized advantage functions (DNAF) and a deterministic policy gradient descent algorithm, achieving faster convergence in simulations.

Network slicing promises to provision diversified services with distinct requirements in one infrastructure. Deep reinforcement learning (e.g., deep $\mathcal{Q}$-learning, DQL) is assumed to be an appropriate algorithm to solve the demand-aware inter-slice resource management issue in network slicing by regarding the varying demands and the allocated bandwidth as the environment state and the action, respectively. However, allocating bandwidth in a finer resolution usually implies larger action space, and unfortunately DQL fails to quickly converge in this case. In this paper, we introduce discrete normalized advantage functions (DNAF) into DQL, by separating the $\mathcal{Q}$-value function as a state-value function term and an advantage term and exploiting a deterministic policy gradient descent (DPGD) algorithm to avoid the unnecessary calculation of $\mathcal{Q}$-value for every state-action pair. Furthermore, as DPGD only works in continuous action space, we embed a k-nearest neighbor algorithm into DQL to quickly find a valid action in the discrete space nearest to the DPGD output. Finally, we verify the faster convergence of the DNAF-based DQL through extensive simulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes