Online Bandit Linear Optimization: A Study
This is an incremental study of an existing method for optimization in machine learning.
The paper tackles the problem of online bandit linear optimization by studying the SCRiBLe algorithm, which achieves an O(√T) regret bound and polynomial runtime complexity.
This article introduces the concepts around Online Bandit Linear Optimization and explores an efficient setup called SCRiBLe (Self-Concordant Regularization in Bandit Learning) created by Abernethy et. al.\cite{abernethy}. The SCRiBLe setup and algorithm yield a $O(\sqrt{T})$ regret bound and polynomial run time complexity bound on the dimension of the input space. In this article we build up to the bandit linear optimization case and study SCRiBLe.