LGAug 29, 2023

Stochastic Graph Bandit Learning with Side-Observations

arXiv:2308.15107v25.31 citationsh-index: 18

Originality Incremental advance

AI Analysis

This work advances the field of stochastic contextual bandits with graph feedback, potentially benefiting applications in domains like recommendation systems, though it appears incremental by building on prior research.

The paper tackles the problem of stochastic contextual bandit learning with graph feedback by proposing an algorithm that adapts to graph structures and reward gaps, achieving improved regret upper bounds without requiring prior knowledge of graphical quantities.

In this paper, we investigate the stochastic contextual bandit with general function space and graph feedback. We propose an algorithm that addresses this problem by adapting to both the underlying graph structures and reward gaps. To the best of our knowledge, our algorithm is the first to provide a gap-dependent upper bound in this stochastic setting, bridging the research gap left by the work in [35]. In comparison to [31,33,35], our method offers improved regret upper bounds and does not require knowledge of graphical quantities. We conduct numerical experiments to demonstrate the computational efficiency and effectiveness of our approach in terms of regret upper bounds. These findings highlight the significance of our algorithm in advancing the field of stochastic contextual bandits with graph feedback, opening up avenues for practical applications in various domains.

View on arXiv PDF

Similar