DSLGFeb 16, 2012

Finding a most biased coin with fewest flips

arXiv:1202.3639v37 citations
Originality Highly original
AI Analysis

This provides an optimal solution for adaptive decision-making in Bayesian multi-armed bandit-like settings, though it is incremental as it builds on existing bandit frameworks.

The paper tackles the problem of identifying the most biased coin among a set with the fewest adaptive flips, achieving an optimal algorithm that minimizes the expected number of tosses to reach a specified confidence level.

We study the problem of learning a most biased coin among a set of coins by tossing the coins adaptively. The goal is to minimize the number of tosses until we identify a coin i* whose posterior probability of being most biased is at least 1-delta for a given delta. Under a particular probabilistic model, we give an optimal algorithm, i.e., an algorithm that minimizes the expected number of future tosses. The problem is closely related to finding the best arm in the multi-armed bandit problem using adaptive strategies. Our algorithm employs an optimal adaptive strategy -- a strategy that performs the best possible action at each step after observing the outcomes of all previous coin tosses. Consequently, our algorithm is also optimal for any starting history of outcomes. To our knowledge, this is the first algorithm that employs an optimal adaptive strategy under a Bayesian setting for this problem. Our proof of optimality employs tools from the field of Markov games.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes