EM LG MESep 16, 2021

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling

Kaito Ariu, Masahiro Kato, Junpei Komiyama, Kenichiro McAlinn, Chao Qin

arXiv:2109.08229v58.625 citations

Originality Synthesis-oriented

AI Analysis

This work addresses theoretical issues in adaptive experimental design for policy choice, providing corrected proofs and a new objective, but it is incremental as it builds directly on prior work.

The paper identifies and corrects errors in the asymptotic results of Theorem 1 from Kasy and Sautmann (2021) on exploration sampling for best arm identification, showing that parts of the theorem are incorrect or flawed, and proposes an alternative objective function to establish asymptotic optimality.

We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive the asymptotic optimality of exploration sampling.

View on arXiv PDF

Similar