LG MLMar 7, 2022

PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters

arXiv:2203.03303v16.98 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses the challenge of transferring knowledge across sequential tasks in bandit problems, but it appears incremental as it builds on existing PAC-Bayesian and lifelong learning frameworks.

The paper tackles the problem of lifelong learning in multi-armed bandits by deriving PAC-Bayesian lower bounds on expected average reward for new tasks and proposes algorithms using these bounds as objectives, which outperform a baseline method in evaluations.

We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds.

View on arXiv PDF

Similar