LGOCOct 14, 2025

Multi-Armed Bandits with Minimum Aggregated Revenue Constraints

arXiv:2510.12523v1h-index: 23
Originality Incremental advance
AI Analysis

This addresses fair revenue allocation in real-world applications with contextual variation, representing an incremental extension of standard bandit frameworks.

The paper tackles a multi-armed bandit problem with contextual information, aiming to ensure each arm receives a minimum aggregated reward across contexts while maximizing total cumulative reward, and derives algorithms with upper bounds on regret and constraint violations, along with a lower bound showing optimal time horizon dependence.

We examine a multi-armed bandit problem with contextual information, where the objective is to ensure that each arm receives a minimum aggregated reward across contexts while simultaneously maximizing the total cumulative reward. This framework captures a broad class of real-world applications where fair revenue allocation is critical and contextual variation is inherent. The cross-context aggregation of minimum reward constraints, while enabling better performance and easier feasibility, introduces significant technical challenges -- particularly the absence of closed-form optimal allocations typically available in standard MAB settings. We design and analyze algorithms that either optimistically prioritize performance or pessimistically enforce constraint satisfaction. For each algorithm, we derive problem-dependent upper bounds on both regret and constraint violations. Furthermore, we establish a lower bound demonstrating that the dependence on the time horizon in our results is optimal in general and revealing fundamental limitations of the free exploration principle leveraged in prior work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes