LGDec 26, 2025

Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms

Kongchang Zhou, Tingyu Zhang, Wei Chen, Fang Kong

arXiv:2512.21925v17.11 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the complementary weaknesses of online and offline learning in bandit problems, offering a solution for scenarios where interaction costs or data quality are concerns, though it appears incremental as it builds on existing CMAB-T frameworks.

The paper tackled the problem of combinatorial multi-armed bandits with probabilistically triggered arms by proposing a hybrid framework that integrates offline data with online interaction to overcome the limitations of purely online or offline methods, resulting in theoretical guarantees and empirical advantages such as accelerated convergence and bias correction.

The problem of combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T) has been extensively studied. Prior work primarily focuses on either the online setting where an agent learns about the unknown environment through iterative interactions, or the offline setting where a policy is learned solely from logged data. However, each of these paradigms has inherent limitations: online algorithms suffer from high interaction costs and slow adaptation, while offline methods are constrained by dataset quality and lack of exploration capabilities. To address these complementary weaknesses, we propose hybrid CMAB-T, a new framework that integrates offline data with online interaction in a principled manner. Our proposed hybrid CUCB algorithm leverages offline data to guide exploration and accelerate convergence, while strategically incorporating online interactions to mitigate the insufficient coverage or distributional bias of the offline dataset. We provide theoretical guarantees on the algorithm's regret, demonstrating that hybrid CUCB significantly outperforms purely online approaches when high-quality offline data is available, and effectively corrects the bias inherent in offline-only methods when the data is limited or misaligned. Empirical results further demonstrate the consistent advantage of our algorithm.

View on arXiv PDF

Similar