MLAILGEMPRDec 3, 2024

Selective Reviews of Bandit Problems in AI via a Statistical View

arXiv:2412.02251v33 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This is an incremental review paper that synthesizes existing knowledge for researchers in reinforcement learning and bandit problems.

This paper provides a selective review of bandit problems in AI, covering foundational models, theoretical tools, and algorithms for managing exploration-exploitation trade-offs, with a focus on stochastic multi-armed bandits, continuum-armed bandits, and contextual bandits.

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes