Covariance-adapting algorithm for semi-bandits with application to sparse rewards
This work provides a more practical theoretical framework for combinatorial semi-bandits by removing the requirement for prior distribution knowledge, benefiting algorithm designers in recommendation and other domains.
The paper addresses stochastic combinatorial semi-bandits by introducing a new family of sub-exponential distributions that relaxes the need for prior knowledge of distribution parameters. The authors prove a lower bound parameterized by the unknown covariance matrix and propose an algorithm achieving tight asymptotic regret, with applications to sparse rewards in recommender systems.
We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.