Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles
This addresses the problem of feature uncertainty in bandit algorithms for decision-making systems, representing an incremental advance by adapting to noisy features.
The paper tackles contextual linear bandits with noisy and missing features, showing that optimal hypotheses can deviate non-intuitively from the true function due to noise, which prevents classical methods from achieving non-trivial regret bounds. They propose an algorithm approximating a Bayesian oracle, achieving a regret bound of $ ilde{O}(d\sqrt{T})$ for many arms, validated on synthetic and real-world datasets.
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. To address the challenges posed by this noise, we analyze Bayesian oracles given the observed noisy features. Our Bayesian analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. These deviations are highly non-intuitive and do not occur in classical noiseless setups. This implies that classical approaches cannot guarantee a non-trivial regret bound. Therefore, we propose an algorithm that aims to approximate the Bayesian oracle based on the observed information under this model, achieving $\tilde{O}(d\sqrt{T})$ regret bound when there is a large number of arms. We demonstrate the proposed algorithm using synthetic and real-world datasets.