Learning Non-myopic Power Allocation in Constrained Scenarios
This addresses a more realistic scenario for wireless network optimization, though it is incremental as it builds on existing learnable algorithms by extending them to episodic constraints.
The paper tackles the problem of optimal power allocation in ad hoc interference networks under time-coupled constraints, proposing a learning-based framework that uses an actor-critic algorithm to achieve superior episodic network-utility performance with improved efficiency in time and computational complexity.
We propose a learning-based framework for efficient power allocation in ad hoc interference networks under episodic constraints. The problem of optimal power allocation -- for maximizing a given network utility metric -- under instantaneous constraints has recently gained significant popularity. Several learnable algorithms have been proposed to obtain fast, effective, and near-optimal performance. However, a more realistic scenario arises when the utility metric has to be optimized for an entire episode under time-coupled constraints. In this case, the instantaneous power needs to be regulated so that the given utility can be optimized over an entire sequence of wireless network realizations while satisfying the constraint at all times. Solving each instance independently will be myopic as the long-term constraint cannot modulate such a solution. Instead, we frame this as a constrained and sequential decision-making problem, and employ an actor-critic algorithm to obtain the constraint-aware power allocation at each step. We present experimental analyses to illustrate the effectiveness of our method in terms of superior episodic network-utility performance and its efficiency in terms of time and computational complexity.