LG NI MLJun 17, 2020

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

arXiv:2006.09997v14.211 citations

Originality Incremental advance

AI Analysis

This addresses resource allocation in networks with hard real-time constraints, but it is incremental as it adapts existing bandit methods to a specific utility model.

The paper tackles a Stochastic Network Utility Maximization problem with unknown agent utilities, proposing algorithms based on Multi-Armed Bandits to minimize regret compared to an oracle policy, and shows optimality for homogeneous agents with performance validated numerically.

In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good' as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.

View on arXiv PDF

Similar