NI LG MADec 4, 2025

Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem

Ali Al Housseini, Cristina Rottondi, Omran Ayoub

arXiv:2512.05207v11.2h-index: 4

Originality Incremental advance

AI Analysis

This addresses the challenge of embedding malleable virtual network requests in network slicing, which is incremental as it extends prior VNE formulations with alternatives.

This paper tackles the Virtual Network Embedding with Alternatives Problem (VNEAP) under dynamic arrivals by proposing HRL-VNEAP, a hierarchical reinforcement learning approach that selects alternative topologies and embeds them onto the substrate network. The method improves acceptance ratio by up to 20.7%, total revenue by up to 36.2%, and revenue-over-cost by up to 22.1% compared to baselines.

Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.

View on arXiv PDF

Similar