SY LGAug 23, 2021

Model-Free Learning of Optimal Deterministic Resource Allocations in Wireless Systems via Action-Space Exploration

arXiv:2108.10352v21.2Has Code

Originality Incremental advance

AI Analysis

This addresses resource allocation challenges in modern wireless communications for users with heterogeneous objectives, though it appears incremental as it builds on existing policy gradient methods.

The paper tackles the problem of nonconvex constrained resource allocation in wireless systems with unknown models by proposing a model-free primal-dual deterministic policy gradient method, achieving near-optimal performance and scalability as confirmed by theory and simulations.

Wireless systems resource allocation refers to perpetual and challenging nonconvex constrained optimization tasks, which are especially timely in modern communications and networking setups involving multiple users with heterogeneous objectives and imprecise or even unknown models and/or channel statistics. In this paper, we propose a technically grounded and scalable primal-dual deterministic policy gradient method for efficiently learning optimal parameterized resource allocation policies. Our method not only efficiently exploits gradient availability of popular universal policy representations, such as deep neural networks, but is also truly model-free, as it relies on consistent zeroth-order gradient approximations of the associated random network services constructed via low-dimensional perturbations in action space, thus fully bypassing any dependence on critics. Both theory and numerical simulations confirm the efficacy and applicability of the proposed approach, as well as its superiority over the current state of the art in terms of both achieving near-optimal performance and scalability.

View on arXiv PDF Code

Similar