Jointly Efficient and Optimal Algorithms for Logistic Bandits
This work addresses a key bottleneck for practitioners in online learning and recommendation systems by providing the first algorithms that are both statistically and computationally efficient for logistic bandits.
The paper tackles the problem of balancing statistical efficiency and computational cost in logistic bandits, introducing a new learning procedure that achieves both optimal regret matching the problem-dependent lower bound and per-round computational efficiency.
Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research effort delivered statistically efficient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require $Ω(t)$ operations at each round. On the other hand, a different line of research focused on computational efficiency ($\mathcal{O}(1)$ per-round cost), but at the cost of letting go of the aforementioned exponential improvements. Obtaining the best of both world is unfortunately not a matter of marrying both approaches. Instead we introduce a new learning procedure for Logistic Bandits. It yields confidence sets which sufficient statistics can be easily maintained online without sacrificing statistical tightness. Combined with efficient planning mechanisms we design fast algorithms which regret performance still match the problem-dependent lower-bound of Abeille et al. (2021). To the best of our knowledge, those are the first Logistic Bandit algorithms that simultaneously enjoy statistical and computational efficiency.