Diffusion Models Meet Contextual Bandits
This work addresses computational and statistical inefficiencies in contextual bandits for applications requiring efficient online decision-making, representing an incremental improvement by integrating diffusion models into existing frameworks.
The paper tackled efficient online decision-making in contextual bandits by using pre-trained diffusion models as expressive priors to capture complex action dependencies, resulting in a practical algorithm that enables fast updates and sampling, with empirical results showing effectiveness across diverse settings.
Efficient online decision-making in contextual bandits is challenging, as methods without informative priors often suffer from computational or statistical inefficiencies. In this work, we leverage pre-trained diffusion models as expressive priors to capture complex action dependencies and develop a practical algorithm that efficiently approximates posteriors under such priors, enabling both fast updates and sampling. Empirical results demonstrate the effectiveness and versatility of our approach across diverse contextual bandit settings.