Thompson Sampling with a Mixture Prior
This work addresses multi-task learning scenarios where agents face varied problem classes, offering a method to incorporate prior structure, though it appears incremental as it builds on existing Thompson sampling frameworks.
The paper tackles the problem of online decision making in environments sampled from mixture distributions, relevant for multi-task learning, by proposing MixTS, a Thompson sampling algorithm with a mixture prior, and proves Bayes regret bounds for it in linear bandits and finite-horizon reinforcement learning, showing empirical effectiveness in experiments.
We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and general proof technique for analyzing the concentration of mixture distributions. We use it to prove Bayes regret bounds for MixTS in both linear bandits and finite-horizon reinforcement learning. Our bounds capture the structure of the prior, depend on the number of mixture components and their widths. We also demonstrate the empirical effectiveness of MixTS in synthetic and real-world experiments.