Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach
This addresses resource allocation in networks with hard real-time constraints, but it is incremental as it adapts existing bandit methods to a specific utility model.
The paper tackles a Stochastic Network Utility Maximization problem with unknown agent utilities, proposing algorithms based on Multi-Armed Bandits to minimize regret compared to an oracle policy, and shows optimality for homogeneous agents with performance validated numerically.
In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good' as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.