LGSep 15, 2025

On the Regularity and Fairness of Combinatorial Multi-Armed Bandit

arXiv:2509.12457v14.1h-index: 3IEEE Transactions on Networking

Originality Incremental advance

AI Analysis

This addresses fairness and regularity issues in applications like wireless networks, but it is incremental as it builds on existing bandit methods with new constraints.

The paper tackles the problem of maximizing cumulative rewards in combinatorial multi-armed bandits while ensuring fairness among arms and reward regularity, proposing a learning algorithm that achieves zero cumulative fairness violation and characterizes regret performance, with results verified on real-world datasets.

The combinatorial multi-armed bandit model is designed to maximize cumulative rewards in the presence of uncertainty by activating a subset of arms in each round. This paper is inspired by two critical applications in wireless networks, where it's not only essential to maximize cumulative rewards but also to guarantee fairness among arms (i.e., the minimum average reward required by each arm) and ensure reward regularity (i.e., how often each arm receives the reward). In this paper, we propose a parameterized regular and fair learning algorithm to achieve these three objectives. In particular, the proposed algorithm linearly combines virtual queue-lengths (tracking the fairness violations), Time-Since-Last-Reward (TSLR) metrics, and Upper Confidence Bound (UCB) estimates in its weight measure. Here, TSLR is similar to age-of-information and measures the elapsed number of rounds since the last time an arm received a reward, capturing the reward regularity performance, and UCB estimates are utilized to balance the tradeoff between exploration and exploitation in online learning. By exploring a key relationship between virtual queue-lengths and TSLR metrics and utilizing several non-trivial Lyapunov functions, we analytically characterize zero cumulative fairness violation, reward regularity, and cumulative regret performance under our proposed algorithm. These theoretical outcomes are verified by simulations based on two real-world datasets.

View on arXiv PDF

Similar