MLLGEMDec 25, 2023

Zero-Inflated Bandits

arXiv:2312.15595v31 citationsh-index: 9ICML
Originality Incremental advance
AI Analysis

This addresses the challenge of learning efficiency in bandits with sparse rewards, representing an incremental advance by applying a known statistical distribution to an under-explored context.

The paper tackled the problem of sparse rewards in bandit applications by modeling rewards with a zero-inflated distribution, developing algorithms based on Upper Confidence Bound and Thompson Sampling that showed superior empirical performance in numerical studies.

Many real-world bandit applications are characterized by sparse rewards, which can significantly hinder learning efficiency. Leveraging problem-specific structures for careful distribution modeling is recognized as essential for improving estimation efficiency in statistics. However, this approach remains under-explored in the context of bandits. To address this gap, we initiate the study of zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution. We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure. The superior empirical performance of these methods is demonstrated through extensive numerical studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes