LGOCMLAug 19, 2025

Multi-User Contextual Cascading Bandits for Personalized Recommendation

arXiv:2508.13981v21 citationsh-index: 2
Originality Highly original
AI Analysis

This addresses the challenge of efficient multi-user interactions in recommendation systems, representing a novel extension of bandit frameworks rather than an incremental improvement.

The paper tackles the problem of personalized recommendation in online advertising by introducing a Multi-User Contextual Cascading Bandit model, which integrates cascading feedback, parallel context sessions, and heterogeneous rewards, and proposes algorithms with proven regret bounds of $\widetilde{O}(\sqrt{THN})$ and $\widetilde{O}(\sqrt{T+HN})$.

We introduce a Multi-User Contextual Cascading Bandit model, a new combinatorial bandit framework that captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously. Unlike classical contextual bandits, MCCB integrates three key structural elements: (i) cascading feedback based on sequential arm exposure, (ii) parallel context sessions enabling selective exploration, and (iii) heterogeneous arm-level rewards. We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of $\widetilde{O}(\sqrt{THN})$ over $T$ episodes, $H$ session steps, and $N$ contexts per episode. Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with Backward Planning (AUCBBP), which shows a strict efficiency improvement in context scaling, i.e., user scaling, with a regret bound of $\widetilde{O}(\sqrt{T+HN})$. We validate our theoretical findings via numerical experiments, demonstrating the empirical effectiveness of both algorithms under various settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes