LGAIFeb 17, 2025

Identifying the Best Transition Law

arXiv:2502.12227v1h-index: 7
Originality Incremental advance
AI Analysis

This work addresses an incremental improvement in bandit algorithms for scenarios with known reward distributions, potentially benefiting reinforcement learning and decision-making applications.

The paper tackles the problem of best-arm identification in bandit settings with multinomial reward distributions, comparing strategies like LUCB with and without known support, and demonstrates their effectiveness through simulations on varying structural complexities.

Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes