LG MLOct 8, 2020

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

arXiv:2010.04091v17.916 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient decision-making in bandit settings for applications like recommendation systems, offering incremental improvements in computational efficiency.

The authors tackled the explore-exploit trade-off in linear and generalized linear bandits by proposing novel learning algorithms based on reward-biased maximum likelihood estimation, achieving order-optimality and competitive empirical performance with state-of-the-art methods, including low computation time per pull.

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods in extensive experiments. The new policies achieve this with low computation time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.

View on arXiv PDF

Similar