On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces
This work provides a theoretical foundation for linear bandit algorithms, which is incremental but crucial for understanding performance limits in reinforcement learning and optimization.
The paper addresses the open problem of characterizing the minimax regret for linear bandits across diverse action spaces, presenting an optimal regret lower bound specifically for a wide class of convex action spaces.
As noted in the works of \cite{lattimore2020bandit}, it has been mentioned that it is an open problem to characterize the minimax regret of linear bandits in a wide variety of action spaces. In this article we present an optimal regret lower bound for a wide class of convex action spaces.