LGMLMay 14, 2020

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

arXiv:2005.06725v112 citations
Originality Incremental advance
AI Analysis

This work addresses fairness and efficiency in sequential decision-making for applications like recommendations, though it is incremental as it extends existing methods to incorporate constraints.

The paper tackles the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints by proposing a Thompson Sampling algorithm with virtual queues, achieving a proven regret bound and demonstrating effectiveness in a movie recommendation application.

We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F). To address the problem, we adopt Thompson Sampling~(TS) to maximize the total rewards and use virtual queue techniques to handle the fairness constraints, and design an algorithm called \emph{TS with beta priors and Bernoulli likelihoods for CSMAB-F~(TSCSF-B)}. Further, we prove TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by $\frac{N}{2η} + O\left(\frac{\sqrt{mNT\ln T}}{T}\right)$, where $N$ is the total number of arms, $m$ is the maximum number of arms that can be pulled simultaneously in each round~(the cardinality constraint) and $η$ is the parameter trading off fairness for rewards. By relaxing the fairness constraints (i.e., let $η\rightarrow \infty$), the bound boils down to the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems. Finally, we perform numerical experiments and use a high-rating movie recommendation application to show the effectiveness and efficiency of the proposed algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes