Linear Contextual Bandits with Interference
This work addresses interference in online decision-making for multi-unit systems, bridging causal inference and bandit theory, but it is incremental as it builds on prior multi-agent and adversarial bandit research.
The paper tackles the problem of interference in linear contextual bandits, where actions affect rewards across units, by introducing a systematic framework and algorithms that quantify interference effects, achieving sublinear regret bounds and demonstrating effectiveness through simulations and synthetic data.
Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.