Deep Reinforcement Learning for Uplink Multi-Carrier Non-Orthogonal Multiple Access Resource Allocation Using Buffer State Information
This addresses scheduling efficiency for user equipments with diverse data rate and latency requirements in wireless networks, representing an incremental improvement in domain-specific optimization.
The paper tackles the resource allocation challenge in uplink multi-carrier NOMA systems by proposing a novel actor-critic reinforcement learning scheduler that incorporates buffer state information, which outperforms benchmark schedulers in training and evaluation using Nokia's 'wireless suite'.
For orthogonal multiple access (OMA) systems, the number of served user equipments (UEs) is limited to the number of available orthogonal resources. On the other hand, non-orthogonal multiple access (NOMA) schemes allow multiple UEs to use the same orthogonal resource. This extra degree of freedom introduces new challenges for resource allocation. Buffer state information (BSI), like the size and age of packets waiting for transmission, can be used to improve scheduling in OMA systems. In this paper, we investigate the impact of BSI on the performance of a centralized scheduler in an uplink multi-carrier NOMA scenario with UEs having various data rate and latency requirements. To handle the large combinatorial space of allocating UEs to the resources, we propose a novel scheduler based on actor-critic reinforcement learning incorporating BSI. Training and evaluation are carried out using Nokia's "wireless suite". We propose various novel techniques to both stabilize and speed up training. The proposed scheduler outperforms benchmark schedulers.