LGJul 6, 2017

Calibrated Fairness in Bandits

arXiv:1707.01875v198 citations
Originality Incremental advance
AI Analysis

This addresses fairness in decision-making for bandit algorithms, but it is incremental as it builds on prior work by adapting fairness frameworks to this setting.

The paper tackles fairness in multi-armed bandits by introducing a smoothness constraint and fairness regret based on calibration, showing that a variation of Thompson sampling achieves this fairness with an O~((kT)^{2/3}) bound on regret.

We study fairness within the stochastic, \emph{multi-armed bandit} (MAB) decision making framework. We adapt the fairness framework of "treating similar individuals similarly" to this setting. Here, an `individual' corresponds to an arm and two arms are `similar' if they have a similar quality distribution. First, we adopt a {\em smoothness constraint} that if two arms have a similar quality distribution then the probability of selecting each arm should be similar. In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization. We show that a variation on Thompson sampling satisfies smooth fairness for total variation distance, and give an $\tilde{O}((kT)^{2/3})$ bound on fairness regret. This complements prior work, which protects an on-average better arm from being less favored. We also explain how to extend our algorithm to the dueling bandit setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes