LGJul 9, 2025

Generalization in Reinforcement Learning for Radio Access Networks

Burak Demirel, Yu Wang, Cristian Tatino, Pablo Soldati

arXiv:2507.06602v29.42 citationsh-index: 11IEEE Trans Mach Learn Commun Netw

Originality Incremental advance

AI Analysis

This addresses the problem of poor generalization in RL-based radio network control for telecom operators, representing a domain-specific incremental improvement.

The paper tackles the generalization problem of reinforcement learning in radio access networks, where data-driven policies often overfit to training conditions. Their proposed framework improves average throughput and spectral efficiency by ~10% over baselines in 5G benchmarks, with gains up to 30% in specific deployments.

Modern RAN operate in highly dynamic and heterogeneous environments, where hand-tuned, rule-based RRM algorithms often underperform. While RL can surpass such heuristics in constrained settings, the diversity of deployments and unpredictable radio conditions introduce major generalization challenges. Data-driven policies frequently overfit to training conditions, degrading performance in unseen scenarios. To address this, we propose a generalization-centered RL framework for RAN control that: (i) robustly reconstructs dynamically varying states from partial and noisy observations, while encoding static and semi-static information, such as radio nodes, cell attributes, and their topology, through graph representations; (ii) applies domain randomization to broaden the training distribution; and (iii) distributes data generation across multiple actors while centralizing training in a cloud-compatible architecture aligned with O-RAN principles. Although generalization increases computational and data-management complexity, our distributed design mitigates this by scaling data collection and training across diverse network conditions. Applied to downlink link adaptation in five 5G benchmarks, our policy improves average throughput and spectral efficiency by ~10% over an OLLA baseline (10% BLER target) in full-buffer MIMO/mMIMO and by >20% under high mobility. It matches specialized RL in full-buffer traffic and achieves up to 4- and 2-fold gains in eMBB and mixed-traffic benchmarks, respectively. In nine-cell deployments, GAT models offer 30% higher throughput over MLP baselines. These results, combined with our scalable architecture, offer a path toward AI-native 6G RAN using a single, generalizable RL agent.

View on arXiv PDF

Similar