Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty

Mehrshad Eskandarpour, Hossein Soleimani

arXiv:2510.248691.1h-index: 2

AI Analysis

For network operators managing dense 5G RANs, this work provides a robust, deployable load balancing solution that outperforms existing methods, though it is an incremental application of PPO to a known problem.

The paper proposes a deep reinforcement learning (PPO) framework for QoS-aware load balancing in 5G networks, demonstrating consistent improvements across all KPIs (throughput, latency, jitter, packet loss, fairness, handovers) compared to rule-based and learning-based baselines, with rapid convergence and strong generalization under increasing user density.

Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain's fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers) and exhibits rapid, stable convergence. Comparative evaluations show that PPO outperforms rule-based ReBuHa and A3 as well as the learning-based CDQL baseline across all KPIs while maintaining smoother learning dynamics and stronger generalization as load increases. These results indicate that PPO's clipped policy updates and advantage-based training yield robust, deployable control for next-generation RAN load balancing using an entirely Python-based toolchain.

View on arXiv PDF

Similar