ITMLJan 12, 2021

Dynamic Spectrum Access using Stochastic Multi-User Bandits

arXiv:2101.04388v112 citations
Originality Incremental advance
AI Analysis

This addresses spectrum efficiency for wireless networks, offering a novel approach to handle collisions, but it is incremental as it builds on existing bandit frameworks.

The paper tackles the problem of uncoordinated spectrum access by developing a stochastic multi-user multi-armed bandit algorithm that allows rewards under collisions, enabling more users than channels. It achieves order-optimal system-wide regret of O(log T) and extends to dynamic user numbers with sub-linear regret.

A stochastic multi-user multi-armed bandit framework is used to develop algorithms for uncoordinated spectrum access. In contrast to prior work, it is assumed that rewards can be non-zero even under collisions, thus allowing for the number of users to be greater than the number of channels. The proposed algorithm consists of an estimation phase and an allocation phase. It is shown that if every user adopts the algorithm, the system wide regret is order-optimal of order $O(\log T)$ over a time-horizon of duration $T$. The regret guarantees hold for both the cases where the number of users is greater than or less than the number of channels. The algorithm is extended to the dynamic case where the number of users in the system evolves over time, and is shown to lead to sub-linear regret.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes